Random data

Xem 1-20 trên 122 kết quả Random data
  • We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data selection and two other previously proposed methods. ...

    pdf5p hongdo_1 12-04-2013 19 2   Download

  • If you were to ask a random sampling of people what data analysis is, most would say that it is the process of calculating and summarizing data to get an answer to a question. In one sense, they are correct. However, the actions they are describing represent only a small part of the process known as data analysis

    pdf50p ptng13 17-05-2012 45 11   Download

  • Technical Analysts often find a system or technical method that seems extremely profitable and convenient to follow - one that they think has been overlooked by the professionals. Sometimes they are right, but most often that method doesn't work in practical trading or for a longer time. Technical analysis uses price and related data to decide when to buy and sell. The methods used can be interpretive as chart patterns and astrology, or as specific as mathematical formulas and spectral analysis. All factors that influence the markets are assumed to be netted out as the current price....

    pdf22p vigro23 29-08-2012 32 4   Download

  • General list: •No restrictions on which operation can be used on the list •No restrictions on where data can be inserted/deleted. Unordered list(random list): Data are not in particular order. Ordered list: data are arranged according to a key.

    pdf71p trinh02 28-01-2013 22 3   Download

  • Nowadays, there are large amounts of data available to train statistical machine translation systems. However, it is not clear whether all the training data actually help or not. A system trained on a subset of such huge bilingual corpora might outperform the use of all the bilingual data. This paper studies such issues by analysing two training data selection techniques: one based on approximating the probability of an indomain corpus; and another based on infrequent n-gram occurrence.

    pdf10p bunthai_1 06-05-2013 19 3   Download

  • This study examines the technical and scale efficiencies for a sample of irrigated and rainfed rice farmers in Anambra State, using data envelopment analysis (DEA). Two(2) local government areas were purposively selected; three communities were randomly selected giving a total of six(6) communities. 

    pdf7p tunghai08 28-06-2015 7 3   Download

  • Frequency distribution models tuned to words and other linguistic events can predict the number of distinct types and their frequency distribution in samples of arbitrary sizes. We conduct, for the first time, a rigorous evaluation of these models based on cross-validation and separation of training and test data. Our experiments reveal that the prediction accuracy of the models is marred by serious overfitting problems, due to violations of the random sampling assumption in corpus data. We then propose a simple pre-processing method to alleviate such non-randomness problems. ...

    pdf8p hongvang_1 16-04-2013 21 2   Download

  • In this paper, we explore the power of randomized algorithm to address the challenge of working with very large amounts of data. We apply these algorithms to generate noun similarity lists from 70 million pages. We reduce the running time from quadratic to practically linear in the number of elements to be computed.

    pdf8p bunbo_1 17-04-2013 13 2   Download

  • This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. ...

    pdf8p bunbo_1 17-04-2013 14 2   Download

  • Lecture "Advanced Econometrics (Part II) - Chapter 10: Models for panel data" presentation of content: General framework for panel data, pooled regression, fixed effects, random effects model, choosing between fixed and random effects models, finding big.

    pdf0p nghe123 06-05-2016 9 2   Download

  • This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. ...

    pdf9p hongphan_1 15-04-2013 19 1   Download

  • Discriminative feature-based methods are widely used in natural language processing, but sentence parsing is still dominated by generative methods. While prior feature-based dynamic programming parsers have restricted training and evaluation to artificially short sentences, we present the first general, featurerich discriminative parser, based on a conditional random field model, which has been successfully scaled to the full WSJ parsing data.

    pdf9p hongphan_1 15-04-2013 18 1   Download

  • This paper presents an efficient inference algorithm of conditional random fields (CRFs) for large-scale data. Our key idea is to decompose the output label state into an active set and an inactive set in which most unsupported transitions become a constant. Our method unifies two previous methods for efficient inference of CRFs, and also derives a simple but robust special case that performs faster than exact inference when the active sets are sufficiently small. We demonstrate that our method achieves dramatic speedup on six standard natural language processing problems. ...

    pdf4p hongphan_1 15-04-2013 15 1   Download

  • In this paper we present a novel approach for inducing word alignments from sentence aligned data. We use a Conditional Random Field (CRF), a discriminative model, which is estimated on a small supervised training set. The CRF is conditioned on both the source and target texts, and thus allows for the use of arbitrary and overlapping features over these data. Moreover, the CRF has efficient training and decoding processes which both find globally optimal solutions.

    pdf8p hongvang_1 16-04-2013 14 1   Download

  • We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood.

    pdf8p hongvang_1 16-04-2013 14 1   Download

  • Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. ...

    pdf9p bunbo_1 17-04-2013 15 1   Download

  • Chapter 12 is devoted to access control, the duties of the data link layer that are related to the use of the physical layer. The main contents of this chapter include all of the following: Random access, controlled access, channelization.

    ppt54p tangtuy04 12-03-2016 3 1   Download

  • Econometricians, as well as other scientists, are engaged in learning from their experience and data - a fundamental objective of science. Knowledge so obtained may be desired for its own sake, for example to satisfy our curiosity about aspects of economic behavior and/or for use in solving practical problems, for example to improve economic policymaking. In the process of learning from experience and data, description and generalization both play important roles.

    pdf112p phuonghoangnho 23-04-2010 249 146   Download

  • This compendium aims at providing a comprehensive overview of the main topics that appear in any well-structured course sequence in statistics for business and economics at the undergraduate and MBA levels. The idea is to supplement either formal or informal statistic textbooks such as, e.g., “Basic Statistical Ideas for Managers” by D.K. Hildebrand and R.L. Ott and “The Practice of Business Statistics: Using Data for Decisions” by D.S. Moore, G.P. McCabe, W.M. Duckworth and S.L. Sclove, with a summary of theory as well as with a couple of extra examples.

    pdf0p sofia11 15-05-2012 83 36   Download

  • If you were to ask a random sampling of people what data analysis is, most would say that it is the process of calculating and summarizing data to get an answer to a question. In one sense, they are correct. However, the actions they are describing represent only a small part of the process known as data analysis.

    pdf30p vongsuiphat 04-01-2010 103 24   Download

CHỦ ĐỀ BẠN MUỐN TÌM

Đồng bộ tài khoản