The statistical models

We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word. The point is quite simple: different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana). Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. ...
8p bunrieu_1 18042013 11 4 Download

We present several unsupervised statistical models for the prepositional phrase attachment task that approach the accuracy of the best supervised methods for this task. Our unsupervised approach uses a heuristic based on attachment proximity and trains from raw text that is annotated with only partofspeech tags and morphological base forms, as opposed to attachment information. It is therefore less resourceintensive and more portable than previous corpusbased algorithm proposed for this task. ...
7p bunrieu_1 18042013 15 3 Download

Traditional concatenative speech synthesis systems use a number of heuristics to deﬁne the target and concatenation costs, essential for the design of the unit selection component. In contrast to these approaches, we introduce a general statistical modeling framework for unit selection inspired by automatic speech recognition. Given appropriate data, techniques based on that framework can result in a more accurate unit selection, thereby improving the general quality of a speech synthesizer. They can also lead to a more modular and a substantially more efﬁcient system. ...
8p bunbo_1 17042013 21 2 Download

This paper presents a novel statistical model for automatic identification of English baseNP. It uses two steps: the Nbest PartOfSpeech (POS) tagging and baseNP identification given the Nbest POSsequences. Unlike the other approaches where the two steps are separated, we integrate them into a unified statistical framework. Our model also integrates lexical information. Finally, Viterbi algorithm is applied to make global search in the entire sentence, allowing us to obtain linear complexity for the entire process. ...
8p bunrieu_1 18042013 19 2 Download

This paper presents a Bayesian decision framework that performs automatic story segmentation based on statistical modeling of one or more lexical chain features. Automatic story segmentation aims to locate the instances in time where a story ends and another begins. A lexical chain is formed by linking coherent lexical items chronologically. A story boundary is often associated with a significant number of lexical chains ending before it, starting after it, as well as a low count of chains continuing through it.
4p hongphan_1 15042013 14 1 Download

One advantage of Short Messaging Service (SMS) texts bethis pretranslation normalization is that the dihave quite differently from normal written versity in different user groups and domains can texts and have some very special phenombe modeled separately without accessing and ena. To translate SMS texts, traditional adapting the language model of the MT system approaches model such irregularities difor each SMS application. Another advantage is rectly in Machine Translation (MT).
8p hongvang_1 16042013 16 1 Download

We present a set of algorithms that enable us to translate natural language sentences by exploiting both a translation memory and a statisticalbased translation model. Our results show that an automatically derived translation memory can be used within a statistical framework to often ﬁnd translations of higher probability than those found using solely a statistical model.
8p bunrieu_1 18042013 15 4 Download

Most documents are about more than one subject, but many NLP and IR techniques implicitly assume documents have just one topic. We describe new clues that mark shifts to new topics, novel algorithms for identifying topic boundaries and the uses of such boundaries once identified. We report topic segmentation performance on several corpora as well as improvement on an IR task that benefits from good segmentation. Introduction Dividing documents into topicallycoherent sections has many uses, but the primary motivation for this work comes from information retrieval (IR). ...
8p bunrieu_1 18042013 12 3 Download

In this paper we propose a method for the automatic decipherment of lost languages. Given a nonparallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a nonparametric Bayesian framework to simultaneously capture both lowlevel character mappings and highlevel morphemic correspondences.
10p hongdo_1 12042013 13 2 Download

In this paper we investigate how to automatically determine if two document collections are written from different perspectives. By perspectives we mean a point of view, for example, from the perspective of Democrats or Republicans. We propose a test of different perspectives based on distribution divergence between the statistical models of two collections. Experimental results show that the test can successfully distinguish document collections of different perspectives from other types of collections. ...
8p hongvang_1 16042013 11 2 Download

We propose a statistical method that ﬁnds the maximumprobability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a stateoftheart text segmentation system.
8p bunrieu_1 18042013 11 2 Download

This paper presents noisychannel based Korean preprocessor system, which corrects word spacing and typographical errors. The proposed algorithm corrects both errors simultaneously. Using Eojeol transition pattern dictionary and statistical data such as Eumjeol ngram and Jaso transition probabilities, the algorithm minimizes the usage of huge word dictionaries.
4p hongvang_1 16042013 15 1 Download

This paper examines the current performance of the stochastic tagger P A R T S (Church 88) in handling phrasal verbs, describes a problem that arises from the statistical model used, and suggests a way to improve the tagger's performance. The solution involves a change in the definition of what counts as a word for the purpose of tagging phrasal verbs.
3p bunmoc_1 20042013 15 1 Download

This book is intended to introduce environmental scientists and managers to the statistical methods that will be useful for them in their work. A secondary aim was to produce a text suitable for a course in statistics for graduate students in the environmental science area. I wrote the book because it seemed to me that these groups should really learn about statistical methods in a special way. It is true that their needs are similar in many respects to those working in other areas.
285p crazy_sms 10052012 95 38 Download

(1) Since the simpler model features less regressor than the larger model, it follows that the VIF of the simpler model will be less than that of the larger model. The reason is that the more variables we include in the model, the greater multicollinearity, and, hence, the greater Rj 2 , unless the omitted variables happen to be orthogonal to the regressors included in the simpler model. The simpler model, which omits relevant variables, produces bias estimates but with smaller variances. Consequently, there appears to be a tradeoff between bias and precision. ...
11p truongdoan 10112009 85 22 Download

Modeling Hydrologic Change: Statistical Methods is about modeling systems where change has affected data that will be used to calibrate and test models of the systems and where models will be used to forecast system responses after change occurs. The focus is not on the hydrology. Instead, hydrology serves as the discipline from which the applications are drawn to illustrate the principles of modeling and the detection of change. All four elements of the modeling process are discussed: conceptualization, formulation, calibration, and verification.
434p crazy_sms 10052012 47 16 Download

Statistical models are empirical. Although they are derived from observations, the relationship described must have a basis in our underlying understanding of processes if we are to have faith in the predictive capabilities of the model (National Research Council 2000).
402p 951628473 07052012 50 15 Download

As your choice of today will directly influence your picture of tomorrow, it is important to correctly choose your projector depending on the field of application you would like to use it in. A digital cinema projector remains after all a cinema projector! To choose the projector model it is essential that it is correctly adapted to the size of the screen that it has to illuminate ...
307p hongphuocidol 03042013 33 9 Download

In addition to covering statistical methods, most of the existing books on equating also focus on the practice of equating, the implications of test development and test use for equating practice and policies, and the daily equating challenges that need to be solved. In some sense, the scope of this book is narrower than of other existing books: to view the equating and linking process as a statistical estimation task.
0p banhkem0908 24112012 37 7 Download

Mathematical modelling is the process of formulating an abstract model in terms of mathematical language to describe the complex behaviour of a real system. Mathematical models are quantitative models and often expressed in terms of ordinary differential equations and partial differential equations. Mathematical models can also be statistical models, fuzzy logic models and empirical relationships. In fact, any model description using mathematical language can be called a mathematical model.
0p thienbinh1311 13122012 15 5 Download