Mô hình maximum entropy

Xem 1-20 trên 31 kết quả Mô hình maximum entropy
  • Trong những năm gần đây, với sự phát triển mạnh mẽ của công nghệ thông tin và nhu cầu sử dụng Internet của tất cả mọi người trên thế giới đã làm tăng vọt lượng thông tin giao dịch trên Internet. Vì vậy mà số lượng văn bản xuất hiện trên Internet tăng nhanh chóng mặt cả về số lượng và chủ đề. Với khối lượng thông tin đồ sộ như vậy,

    pdf60p chieu_mua 28-08-2012 63 28   Download

  • Nội dung của khóa luận được tổ chức thành ba chương như sau: Chương 1: Trình bày bài toán phân lớp quan điểm, nhiệm vụ của bài toán phân lớp quan điểm. Chương 2: Trình bày về mô hình và thuật toán Entropy cực đại cho bài toán phân lớp quan điểm. Chương 3: Trình bày những kết quả đánh giá thử nghiệm của khóa luận áp dụng cho bài toán phân lớp quan điểm.

    pdf32p chieuwindows23 01-06-2013 82 29   Download

  • We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal reordering with generalization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reordering events of neighbor blocks from bilingual data.

    pdf8p hongvang_1 16-04-2013 21 2   Download

  • Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process might require to estimate parameters for many candidate features many times.

    pdf7p bunrieu_1 18-04-2013 13 2   Download

  • This paper proposes a learning method of translation rules from parallel corpora. This method applies the maximum entropy principle to a probabilistic model of translation rules. First, we define feature functions which express statistical properties of this model. Next, in order to optimize the model, the system iterates following steps: (1) selects a feature function which maximizes loglikelihood, and (2) adds this function to the model incrementally.

    pdf5p bunrieu_1 18-04-2013 12 2   Download

  • We propose a statistical dialogue analysis model to determine discourse structures as well as speech acts using maximum entropy model. The model can automatically acquire probabilistic discourse knowledge from a discourse tagged corpus to resolve ambiguities. We propose the idea of tagging discourse segment boundaries to represent the structural information of discourse. Using this representation we can effectively combine speech act analysis and discourse structure analysis in one framework.

    pdf8p bunrieu_1 18-04-2013 14 2   Download

  • We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is significantly improved using this approach. ...

    pdf8p bunmoc_1 20-04-2013 12 2   Download

  • Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with different diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document.

    pdf8p hongvang_1 16-04-2013 14 1   Download

  • Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results.

    pdf4p bunbo_1 17-04-2013 17 1   Download

  • Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word sense disambiguation. One way to deal with this problem within the statistical framework is to use maximum entropy methods. In this paper, we present how to use this type of information within a statistical machine translation system. We show that it is possible to significantly decrease training and test corpus perplexity of the translation models. ...

    pdf8p bunrieu_1 18-04-2013 10 1   Download

  • In this paper, we propose adding long-term grammatical information in a Whole Sentence Maximun Entropy Language Model (WSME) in order to improve the performance of the model. The grammatical information was added to the WSME model as features and were obtained from a Stochastic Context-Free grammar. Finally, experiments using a part of the Penn Treebank corpus were carried out and significant improvements were acheived.

    pdf8p bunrieu_1 18-04-2013 21 1   Download

  • This paper presents a detailed study of the integration of knowledge from both dependency parses and hierarchical word ontologies into a maximum-entropy-based tagging model that simultaneously labels words with both syntax and semantics. Our findings show that information from both these sources can lead to strong improvements in overall system accuracy: dependency knowledge improved performance over all classes of word, and knowledge of the position of a word in an ontological hierarchy increased accuracy for words not seen in the training data. ...

    pdf8p hongvang_1 16-04-2013 23 5   Download

  • In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. ...

    pdf7p bunrieu_1 18-04-2013 25 4   Download

  • This paper proposes a novel method for learning probability models of subcategorization preference of verbs. We consider the issues of case dependencies and noun class generalization in a uniform way by employing the maximum entropy modeling method. We also propose a new model selection algorithm which starts from the most general model and gradually examines more specific models.

    pdf7p bunrieu_1 18-04-2013 21 4   Download

  • This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics community: Maximum Entropy (ME) estimation with L2 regularization, the Averaged Perceptron (AP), and Boosting. We also investigate ME estimation with L1 regularization using a novel optimization algorithm, and BLasso, which is a version of Boosting with Lasso (L1) regularization. We first investigate all of our estimators on two re-ranking tasks: a parse selection task and a language model (LM) adaptation task. ...

    pdf8p hongvang_1 16-04-2013 20 3   Download

  • This paper investigates a machine learning approach for temporally ordering and anchoring events in natural language texts. To address data sparseness, we used temporal reasoning as an oversampling method to dramatically expand the amount of training data, resulting in predictive accuracy on link labeling as high as 93% using a Maximum Entropy classifier on human annotated data. This method compared favorably against a series of increasingly sophisticated baselines involving expansion of rules derived from human intuitions. ...

    pdf8p hongvang_1 16-04-2013 17 2   Download

  • This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. ...

    pdf4p hongvang_1 16-04-2013 14 2   Download

  • Sentence boundary detection in speech is important for enriching speech recognition output, making it easier for humans to read and downstream modules to process. In previous work, we have developed hidden Markov model (HMM) and maximum entropy (Maxent) classifiers that integrate textual and prosodic knowledge sources for detecting sentence boundaries.

    pdf8p bunbo_1 17-04-2013 17 2   Download

  • The major obstacle in morphological (sometimes called morpho-syntactic, or extended POS) tagging of highly inflective languages, such as Czech or Russian, is - given the resources possibly available - the tagset size. Typically, it is in the order of thousands. Our method uses an exponential probabilistic model based on automatically selected features. The parameters of the model are computed using simple estimates (which makes training much faster than when one uses Maximum Entropy) to directly minimize the error rate on training data.

    pdf8p bunrieu_1 18-04-2013 16 2   Download

  • In this paper, we explore correlation of dependency relation paths to rank candidate answers in answer extraction. Using the correlation measure, we compare dependency relations of a candidate answer and mapped question phrases in sentence with the corresponding relations in question. Different from previous studies, we propose an approximate phrase mapping algorithm and incorporate the mapping score into the correlation measure. The correlations are further incorporated into a Maximum Entropy-based ranking model which estimates path weights from training.

    pdf8p hongvang_1 16-04-2013 17 1   Download

Đồng bộ tài khoản