Probability model from

This paper proposes a novel method for learning probability models of subcategorization preference of verbs. We consider the issues of case dependencies and noun class generalization in a uniform way by employing the maximum entropy modeling method. We also propose a new model selection algorithm which starts from the most general model and gradually examines more specific models.
7p bunrieu_1 18042013 29 5 Download

In the quest for knowledge, it is not uncommon for researchers to push the limits of simulation techniques to the point where they have to be adapted or totally new techniques or approaches become necessary. True multiscale modeling techniques are becoming increasingly necessary given the growing interest in materials and processes on which largescale properties are dependent or that can be tuned by their lowscale properties. An example would be nanocomposites, where embedded nanostructures completely change the matrix properties due to effects occurring at the atomic level.
0p thienbinh1311 13122012 19 3 Download

User simulations are shown to be useful in spoken dialog system development. Since most current user simulations deploy probability models to mimic human user behaviors, how to set up user action probabilities in these models is a key problem to solve. One generally used approach is to estimate these probabilities from human user data. However, when building a new dialog system, usually no data or only a small amount of data is available.
9p hongphan_1 14042013 26 2 Download

Language models for speech recognition typically use a probability model of the form Pr(an[al,a2,...,ani). Stochastic grammars, on the other hand, are typically used to assign structure to utterances, A language model of the above form is constructed from such grammars by computing the prefix probability ~we~* Pr(al..artw), where w represents all possible terminations of the prefix al...an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n 6) time. ...
7p bunrieu_1 18042013 22 2 Download

This paper explores the use of clickthrough data for query spelling correction. First, large amounts of querycorrection pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrasebased error model that accounts for the transformation probability between multiterm phrases is trained and integrated into a query speller system.
9p hongdo_1 12042013 22 1 Download

This paper compares a number of generative probability models for a widecoverage Combinatory Categorial Grammar (CCG) parser. These models are trained and tested on a corpus obtained by translating the Penn Treebank trees into CCG normalform derivations. According to an evaluation of unlabeled wordword dependencies, our best model achieves a performance of 89.9%, comparable to the ﬁgures given by Collins (1999) for a linguistically less expressive grammar. In contrast to Gildea (2001), we ﬁnd a significant improvement from modeling wordword dependencies. ...
8p bunmoc_1 20042013 18 1 Download

Chapter 4: Bayes Classifier present of you about The naïve Bayes Probabilistic model, Constructing a Classifier from the probability model, An application of Naïve Bayes Classifier, Bayesian network.
27p cocacola_10 08122015 20 1 Download

Econometricians, as well as other scientists, are engaged in learning from their experience and data  a fundamental objective of science. Knowledge so obtained may be desired for its own sake, for example to satisfy our curiosity about aspects of economic behavior and/or for use in solving practical problems, for example to improve economic policymaking. In the process of learning from experience and data, description and generalization both play important roles.
112p phuonghoangnho 23042010 259 146 Download

There are many books written about statistics, some brief, some detailed, some humorous, some colorful, and some quite dry. Each of these texts is designed for a specific audience. Too often, texts about statistics have been rather theoretical and intimidating for those not practicing statistical analysis on a routine basis. Thus, many engineers and scientists, who need to use statistics much more frequently than calculus or differential equations, lack sufficient knowledge of the use of statistics.
103p chuyenphimbuon 21072012 31 10 Download

Continuing improvements led to the furnace and bellows and provided the ability to smelt and forge native metals (naturally occurring in relatively pure form).[38] Gold, copper, silver, and lead, were such early metals. The advantages of copper tools over stone, bone, and wooden tools were quickly apparent to early humans, and native copper was probably used from near the beginning of Neolithic times (about 8000 BC).[39] Native copper does not naturally occur in large amounts, but copper ores are quite common and some of them produce metal easily when burned in wood or charcoal fires.
354p louisxlll 20122012 27 5 Download

This paper presents an algorithm for learning the probabilities of optional phonological rules from corpora. The algorithm is based on using a speech recognition system to discover the surface pronunciations of words in spe.ech corpora; using an automatic system obviates expensive phonetic labeling by hand. We describe the details of our algorithm and show the probabilities the system has learned for ten common phonological rules which model reductions and coarticulation effects.
8p bunmoc_1 20042013 32 4 Download

We investigate a number of simple methods for improving the wordalignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM training of model parameters.
8p bunbo_1 17042013 16 3 Download

We present the PONG method to compute selectional preferences using partofspeech (POS) Ngrams. From a corpus labeled with grammatical dependencies, PONG learns the distribution of word relations for each POS Ngram. From the much larger but unlabeled Google Ngrams corpus, PONG learns the distribution of POS Ngrams for a given pair of words. We derive the probability that one word has a given grammatical relation to the other. PONG estimates this probability by combining both distributions, whether or not either word occurs in the labeled corpus. ...
10p bunthai_1 06052013 23 3 Download

Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in wordaligned training data. Most approaches report problems with overﬁtting. We describe a novel leavingoneout approach to prevent overﬁtting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl GermanEnglish task.
10p hongdo_1 12042013 32 2 Download

We propose a statistical method that ﬁnds the maximumprobability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a stateoftheart text segmentation system.
8p bunrieu_1 18042013 17 2 Download

Language modeling is to associate a sequence of words with a priori probability, which is a key part of many natural language applications such as speech recognition and statistical machine translation. In this paper, we present a language modeling based on a kind of simple dependency grammar. The grammar consists of headdependent relations between words and can be learned automatically from a raw corpus using the reestimation algorithm which is also introduced in this paper. Our experiments show that the proposed model performs better than ngram models at 11% to 11.
5p bunrieu_1 18042013 21 2 Download

PCFGs can be accurate, they suffer from vocabulary coverage problems: treebanks are small and lexicons induced from them are limited. The reason for this treebankcentric view in PCFG learning is 3fold: the English treebank is fairly large and English morphology is fairly simple, so that in English, the treebank does provide mostly adequate lexical coverage1 ; Lexicons enumerate analyses, but don’t provide probabilities for them; and, most importantly, the treebank and the external lexicon are likely to follow different annotation schemas, reﬂecting different linguistic perspectives.
9p bunthai_1 06052013 15 2 Download

We describe a novel method that extracts paraphrases from a bitext, for both the source and target languages. In order to reduce the search space, we decompose the phrasetable into subphrasetables and construct separate clusters for source and target phrases. We convert the clusters into graphs, add smoothing/syntacticinformationcarrier vertices, and compute the similarity between phrases with a random walkbased measure, the commute time.
10p bunthai_1 06052013 21 2 Download

In this paper, we extend the work on using latent crosslanguage topic models for identifying word translations across comparable corpora. We present a novel precisionoriented algorithm that relies on pertopic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the onetoone constraint.
11p bunthai_1 06052013 23 2 Download

In dataoriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Treebank. If a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. ...
9p bunthai_1 06052013 26 2 Download