Nội dung của khóa luận được tổ chức thành ba chương như sau: Chương 1: Trình bày bài toán phân lớp quan điểm, nhiệm vụ của bài toán phân lớp quan điểm. Chương 2: Trình bày về mô hình và thuật toán Entropy cực đại cho bài toán phân lớp quan điểm. Chương 3: Trình bày những kết quả đánh giá thử nghiệm của khóa luận áp dụng cho bài toán phân lớp quan điểm.
We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.
We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal reordering with generalization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reordering events of neighbor blocks from bilingual data.
Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process might require to estimate parameters for many candidate features many times.
This paper proposes a learning method of translation rules from parallel corpora. This method applies the maximum entropy principle to a probabilistic model of translation rules. First, we define feature functions which express statistical properties of this model. Next, in order to optimize the model, the system iterates following steps: (1) selects a feature function which maximizes loglikelihood, and (2) adds this function to the model incrementally.
We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from a discourse tagged corpus to resolve ambiguities. We propose the idea of tagging discourse segment boundaries to represent the structural information of discourse. Using this representation we can effectively combine speech act analysis and discourse structure analysis in one framework.
We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is signiﬁcantly improved using this approach. ...
Explanation-based generalization is used to extract a specialized grammar from the original one using a training corpus of parse trees. This allows very much faster parsing and gives a lower error rate, at the price of a small loss in coverage. Previously, it has been necessary to specify the tree-cutting criteria (or operationality criteria) manually; here they are derived automatically from the training set and the desired coverage of the specialized grammar.
This paper describes a dependency structure analysis of Japanese sentences based on the maximum entropy models. Our model is created by learning the weights of some features from a training corpus to predict the dependency between bunsetsus or phrasal units. The dependency accuracy of our system is 87.2% using the Kyoto University corpus. We discuss the contribution of each feature set and the relationship between the number of training data and the accuracy.
Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of decision trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to four phrase chunking tasks: Portuguese noun phrase chunking, English base noun phrase chunking, English text chunking and Hindi text chunking. In all four tasks, ETL shows better results than Decision Trees and also than TBL with hand-crafted templates.
This paper studies the impact of written language variations and the way it affects the capitalization task over time. A discriminative approach, based on maximum entropy models, is proposed to perform capitalization, taking the language changes into consideration. The proposed method makes it possible to use large corpora for training. The evaluation is performed over newspaper corpora using different testing periods. The achieved results reveal a strong relation between the capitalization performance and the elapsed time between the training and testing data periods. ...
Maximum entropy (Maxent) is useful in many areas. Iterative scaling (IS) methods are one of the most popular approaches to solve Maxent. With many variants of IS methods, it is difﬁcult to understand them and see the differences. In this paper, we create a general and uniﬁed framework for IS methods. This framework also connects IS and coordinate descent (CD) methods. Besides, we develop a CD method for Maxent. Results show that it is faster than existing iterative scaling methods1 .
Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with diﬀerent diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document.
Figure 1: Intuitive illustration of a variety of successive tokens and a word boundary mentation by formalizing the uncertainty of successive tokens via the branching entropy (which we mathematically de ne in the next section). Our intention in this paper is above all to study the fundamental and scienti c statistical property underlying language data, so that it can be applied to language engineering. The above assumption (A) dates back to the fundamental work done by Harris (Harris, 1955), where he says that when the number of di erent tokens coming after every pre x of a word marks...
Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results.
In this paper, we propose adding long-term grammatical information in a Whole Sentence Maximun Entropy Language Model (WSME) in order to improve the performance of the model. The grammatical information was added to the WSME model as features and were obtained from a Stochastic Context-Free grammar. Finally, experiments using a part of the Penn Treebank corpus were carried out and signiﬁcant improvements were acheived.
Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word sense disambiguation. One way to deal with this problem within the statistical framework is to use maximum entropy methods. In this paper, we present how to use this type of information within a statistical machine translation system. We show that it is possible to signiﬁcantly decrease training and test corpus perplexity of the translation models. ...
We present a constancy rate principle governing language generation. We show that this principle implies that local measures of entropy (ignoring context) should increase with the sentence number. We demonstrate that this is indeed the case by measuring entropy in three diﬀerent ways. We also show that this eﬀect has both lexical (which words are used) and non-lexical (how the words are used) causes.
In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language target sentence. We mainly focus on the quality of the different alignment and ME models and show that the direct ME approach outperforms the alignment templates method. ...
This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (Gis) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar. ...