Random data

We address the problem of selecting nondomainspeciﬁc language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the crossentropy, according to domainspeciﬁc and nondomainspecifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods. ...
5p hongdo_1 12042013 20 2 Download

If you were to ask a random sampling of people what data analysis is, most would say that it is the process of calculating and summarizing data to get an answer to a question. In one sense, they are correct. However, the actions they are describing represent only a small part of the process known as data analysis
50p ptng13 17052012 55 11 Download

Technical Analysts often find a system or technical method that seems extremely profitable and convenient to follow  one that they think has been overlooked by the professionals. Sometimes they are right, but most often that method doesn't work in practical trading or for a longer time. Technical analysis uses price and related data to decide when to buy and sell. The methods used can be interpretive as chart patterns and astrology, or as specific as mathematical formulas and spectral analysis. All factors that influence the markets are assumed to be netted out as the current price....
22p vigro23 29082012 35 4 Download

General list: •No restrictions on which operation can be used on the list •No restrictions on where data can be inserted/deleted. Unordered list(random list): Data are not in particular order. Ordered list: data are arranged according to a key.
71p trinh02 28012013 25 3 Download

Nowadays, there are large amounts of data available to train statistical machine translation systems. However, it is not clear whether all the training data actually help or not. A system trained on a subset of such huge bilingual corpora might outperform the use of all the bilingual data. This paper studies such issues by analysing two training data selection techniques: one based on approximating the probability of an indomain corpus; and another based on infrequent ngram occurrence.
10p bunthai_1 06052013 19 3 Download

Frequency distribution models tuned to words and other linguistic events can predict the number of distinct types and their frequency distribution in samples of arbitrary sizes. We conduct, for the ﬁrst time, a rigorous evaluation of these models based on crossvalidation and separation of training and test data. Our experiments reveal that the prediction accuracy of the models is marred by serious overﬁtting problems, due to violations of the random sampling assumption in corpus data. We then propose a simple preprocessing method to alleviate such nonrandomness problems. ...
8p hongvang_1 16042013 26 2 Download

In this paper, we explore the power of randomized algorithm to address the challenge of working with very large amounts of data. We apply these algorithms to generate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the number of elements to be computed.
8p bunbo_1 17042013 17 2 Download

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random ﬁelds (CRFs). The models are encoded as deterministic weighted ﬁnite state automata, and are applied by intersecting the automata with wordlattices that are the output from a baseline recognizer. The perceptron algorithm has the beneﬁt of automatically selecting a relatively small feature set in just a couple of passes over the training data. ...
8p bunbo_1 17042013 19 2 Download

This study examines the technical and scale efficiencies for a sample of irrigated and rainfed rice farmers in Anambra State, using data envelopment analysis (DEA). Two(2) local government areas were purposively selected; three communities were randomly selected giving a total of six(6) communities.
7p tunghai08 28062015 14 4 Download

This paper presents a semisupervised training method for linearchain conditional random ﬁelds that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. ...
9p hongphan_1 15042013 20 1 Download

Discriminative featurebased methods are widely used in natural language processing, but sentence parsing is still dominated by generative methods. While prior featurebased dynamic programming parsers have restricted training and evaluation to artiﬁcially short sentences, we present the ﬁrst general, featurerich discriminative parser, based on a conditional random ﬁeld model, which has been successfully scaled to the full WSJ parsing data.
9p hongphan_1 15042013 19 1 Download

This paper presents an efﬁcient inference algorithm of conditional random ﬁelds (CRFs) for largescale data. Our key idea is to decompose the output label state into an active set and an inactive set in which most unsupported transitions become a constant. Our method uniﬁes two previous methods for efﬁcient inference of CRFs, and also derives a simple but robust special case that performs faster than exact inference when the active sets are sufﬁciently small. We demonstrate that our method achieves dramatic speedup on six standard natural language processing problems. ...
4p hongphan_1 15042013 17 1 Download

In this paper we present a novel approach for inducing word alignments from sentence aligned data. We use a Conditional Random Field (CRF), a discriminative model, which is estimated on a small supervised training set. The CRF is conditioned on both the source and target texts, and thus allows for the use of arbitrary and overlapping features over these data. Moreover, the CRF has efﬁcient training and decoding processes which both ﬁnd globally optimal solutions.
8p hongvang_1 16042013 20 1 Download

We present a new semisupervised training procedure for conditional random ﬁelds (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood.
8p hongvang_1 16042013 16 1 Download

Conditional random ﬁelds (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are loglinear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for loglinear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efﬁcient. ...
9p bunbo_1 17042013 18 1 Download

Lecture "Advanced Econometrics (Part II)  Chapter 10: Models for panel data" presentation of content: General framework for panel data, pooled regression, fixed effects, random effects model, choosing between fixed and random effects models, finding big.
0p nghe123 06052016 16 2 Download

Chapter 12 is devoted to access control, the duties of the data link layer that are related to the use of the physical layer. The main contents of this chapter include all of the following: Random access, controlled access, channelization.
54p tangtuy04 12032016 6 1 Download

Econometricians, as well as other scientists, are engaged in learning from their experience and data  a fundamental objective of science. Knowledge so obtained may be desired for its own sake, for example to satisfy our curiosity about aspects of economic behavior and/or for use in solving practical problems, for example to improve economic policymaking. In the process of learning from experience and data, description and generalization both play important roles.
112p phuonghoangnho 23042010 256 146 Download

This compendium aims at providing a comprehensive overview of the main topics that appear in any wellstructured course sequence in statistics for business and economics at the undergraduate and MBA levels. The idea is to supplement either formal or informal statistic textbooks such as, e.g., “Basic Statistical Ideas for Managers” by D.K. Hildebrand and R.L. Ott and “The Practice of Business Statistics: Using Data for Decisions” by D.S. Moore, G.P. McCabe, W.M. Duckworth and S.L. Sclove, with a summary of theory as well as with a couple of extra examples.
0p sofia11 15052012 86 36 Download

If you were to ask a random sampling of people what data analysis is, most would say that it is the process of calculating and summarizing data to get an answer to a question. In one sense, they are correct. However, the actions they are describing represent only a small part of the process known as data analysis.
30p vongsuiphat 04012010 112 24 Download