Word alignment plays a crucial role in statistical machine translation. Word-aligned corpora have been found to be an excellent source of translation-related knowledge. We present a statistical model for computing the probability of an alignment given a sentence pair. This model allows easy integration of context-speciﬁc features. Our experiments show that this model can be an effective tool for improving an existing word alignment.
This paper proposes a novel method for learning probability models of subcategorization preference of verbs. We consider the issues of case dependencies and noun class generalization in a uniform way by employing the maximum entropy modeling method. We also propose a new model selection algorithm which starts from the most general model and gradually examines more specific models.
User simulations are shown to be useful in spoken dialog system development. Since most current user simulations deploy probability models to mimic human user behaviors, how to set up user action probabilities in these models is a key problem to solve. One generally used approach is to estimate these probabilities from human user data. However, when building a new dialog system, usually no data or only a small amount of data is available.
Language models for speech recognition typically use a probability model of the form Pr(an[al,a2,...,an-i). Stochastic grammars, on the other hand, are typically used to assign structure to utterances, A language model of the above form is constructed from such grammars by computing the prefix probability ~we~* Pr(al.-.artw), where w represents all possible terminations of the prefix al...an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n 6) time. ...
This paper compares a number of generative probability models for a widecoverage Combinatory Categorial Grammar (CCG) parser. These models are trained and tested on a corpus obtained by translating the Penn Treebank trees into CCG normal-form derivations. According to an evaluation of unlabeled word-word dependencies, our best model achieves a performance of 89.9%, comparable to the ﬁgures given by Collins (1999) for a linguistically less expressive grammar. In contrast to Gildea (2001), we ﬁnd a significant improvement from modeling wordword dependencies. ...
This book will teach you how to bring together what you know
of finance, accounting, and the spreadsheet to give you a new
skill—building financial models. The ability to create and unde
stand models is one of the most valued skills in business an
finance today. It’s an expertise that will stand you in good stea
in any arena—Wall Street or Main Street—where numbers ar
important. Whether you are a veteran, just starting out on you
career, or still in school, having this expertise can give you
competitive advantage in what you want to do....
This book is about foundational issues in risk and risk analysis; how risk should
be expressed; what the meaning of risk is; how to understand and use models;
how to understand and address uncertainty; and how parametric probability
models like the Poisson model should be understood and used. A unifying and
holistic approach to risk and uncertainty is presented, for different applications
Air pollution has always been a trans-boundary environmental problem and a matter of global concern for past many years. High concentrations of air pollutants due to numerous anthropogenic activities influence the air quality. There are many books on this subject, but the one in front of you will probably help in filling the gaps existing in the area of air quality monitoring, modelling, exposure, health and control, and can be of great help to graduate students professionals and researchers.
Tham khảo sách '.probability for financepatrick roger strasbourg university, em strasbourg business school may', tài chính - ngân hàng, tài chính doanh nghiệp phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả
This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, morphological segmentation) while learning a morpheme segmentation over the target language. Our model outperforms a competitive word alignment system in alignment quality. ...
This paper discusses a decision-tree approach to the problem of assigning probabilities to words following a given text. In contrast with previous decision-tree language model attempts, an algorithm for selecting nearly optimal questions is considered. The model is to be tested on a standard task, The Wall Street Journal, allowing a fair comparison with the well-known trigram model.
This paper presents an algorithm for learning the probabilities of optional phonological rules from corpora. The algorithm is based on using a speech recognition system to discover the surface pronunciations of words in spe.ech corpora; using an automatic system obviates expensive phonetic labeling by hand. We describe the details of our algorithm and show the probabilities the system has learned for ten common phonological rules which model reductions and coarticulation effects.
In the quest for knowledge, it is not uncommon for researchers to push the limits
of simulation techniques to the point where they have to be adapted or totally new
techniques or approaches become necessary. True multiscale modeling techniques
are becoming increasingly necessary given the growing interest in materials and
processes on which large-scale properties are dependent or that can be tuned by their
low-scale properties. An example would be nanocomposites, where embedded nanostructures
completely change the matrix properties due to effects occurring at the
The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems.
We investigate a number of simple methods for improving the word-alignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM training of model parameters.
Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overﬁtting. We describe a novel leavingone-out approach to prevent over-ﬁtting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task.