Xem 1-20 trên 67 kết quả Mô hình nlp
  • Mô hình tài chính - ứng dụng excel giải tài chính doanh nghiệp. Trên thực tế có nhiều vấn đề trong kinh tế và trong các hoạt động kinh doanh có những mối liên hệ với nhau không phải là mối quan hệ tuyến tính mà là phi tuyến. Sự tồn tại các mối quan hệ không theo tỷ lệ ( doanh số đạt được không theo tỷ lệ với giá bán vì giá bán có thể tăng và doanh số có thể giảm. Sự tồn tại các mối quan hệ không mang tính cộng bổ sung (rủi ro của...

    pdf7p hoangquoctrung1991 31-07-2012 48 8   Download

  • This interactive presentation describes LexNet, a graphical environment for graph-based NLP developed at the University of Michigan. LexNet includes LexRank (for text summarization), biased LexRank (for passage retrieval), and TUMBL (for binary classification). All tools in the collection are based on random walks on lexical graphs, that is graphs where different NLP objects (e.g., sentences or phrases) are represented as nodes linked by edges proportional to the lexical similarity between the two nodes.

    pdf4p hongvang_1 16-04-2013 17 3   Download

  • A layered approach to information retrieval permits the inclusion of multiple search engines as well as multiple databases, with a natural language layer to convert English queries for use by the various search engines. The NLP layer incorporates morphological analysis, noun phrase syntax, and semantic expansion based on WordNet.

    pdf7p bunrieu_1 18-04-2013 13 3   Download

  • Domain adaptation is an important problem in natural language processing (NLP) due to the lack of labeled data in novel domains. In this paper, we study the domain adaptation problem from the instance weighting perspective. We formally analyze and characterize the domain adaptation problem from a distributional view, and show that there are two distinct needs for adaptation, corresponding to the different distributions of instances and classification functions in the source and the target domains. ...

    pdf8p hongvang_1 16-04-2013 16 2   Download

  • Much effort has been put into computational lexicons over the years, and most systems give much room to (lexical) semantic data. However, in these systems, the effort put on the study and representation of lexical items to express the underlying continuum existing in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has remained embryonic.

    pdf7p bunrieu_1 18-04-2013 14 2   Download

  • The statistical modelling of language, together with advances in wide-coverage grammar development, have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated large-scale language processing a possibility (Matsuzaki et al., 2007; Kaplan et al., 2004). This paper describes an NLP system which is based on syntactic and semantic formalisms from theoretical linguistics, and which we have used to analyse the entire Gigaword corpus (1 billion words) in less than 5 days using only 18 processors. ...

    pdf4p hongvang_1 16-04-2013 26 1   Download

  • We have witnessed signi cant progress in NLP applications such as information extraction IE, summarization, machine translation, cross-lingual information retrieval CLIR, etc. The progress will be accelerated by advances in speech technology, which not only enables us to interact with systems via speech but also to store and retrieve texts input via speech.

    pdf8p bunrieu_1 18-04-2013 10 1   Download

  • To facilitate the use of syntactic information in the study of child language acquisition, a coding scheme for Grammatical Relations (GRs) in transcripts of parent-child dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We discuss the use of current NLP techniques to produce the GRs in this annotation scheme. By using a statistical parser (Charniak, 2000) and memorybased learning tools for classification (Daelemans et al., 2004), we obtain high precision and recall of several GRs. ...

    pdf8p bunbo_1 17-04-2013 11 4   Download

  • In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy) are trained on the same corpus data. After comparison, their outputs are combined using several voting strategies and second stage classifiers. ...

    pdf7p bunrieu_1 18-04-2013 25 4   Download

  • This paper introduces a method for the semi-automatic generation of grammar test items by applying Natural Language Processing (NLP) techniques. Based on manually-designed patterns, sentences gathered from the Web are transformed into tests on grammaticality. The method involves representing test writing knowledge as test patterns, acquiring authentic sentences on the Web, and applying generation strategies to transform sentences into items.

    pdf4p hongvang_1 16-04-2013 33 3   Download

  • The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language. This paper reports on the simplified toolkit and explains how it is used in teaching NLP....

    pdf4p hongvang_1 16-04-2013 28 3   Download

  • We study the issue of porting a known NLP method to a language with little existing NLP resources, specifically Hebrew SVM-based chunking. We introduce two SVM-based methods – Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models, which provides guidance to identify errors in the training corpus, distinguish the role and interaction of lexical features and eventually construct a model with ∼10% error reduction.

    pdf8p hongvang_1 16-04-2013 22 3   Download

  • This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics community: Maximum Entropy (ME) estimation with L2 regularization, the Averaged Perceptron (AP), and Boosting. We also investigate ME estimation with L1 regularization using a novel optimization algorithm, and BLasso, which is a version of Boosting with Lasso (L1) regularization. We first investigate all of our estimators on two re-ranking tasks: a parse selection task and a language model (LM) adaptation task. ...

    pdf8p hongvang_1 16-04-2013 20 3   Download

  • Most documents are about more than one subject, but many NLP and IR techniques implicitly assume documents have just one topic. We describe new clues that mark shifts to new topics, novel algorithms for identifying topic boundaries and the uses of such boundaries once identified. We report topic segmentation performance on several corpora as well as improvement on an IR task that benefits from good segmentation. Introduction Dividing documents into topically-coherent sections has many uses, but the primary motivation for this work comes from information retrieval (IR). ...

    pdf8p bunrieu_1 18-04-2013 12 3   Download

  • Automatically acquired lexicons with subcategorization information have already proved accurate and useful enough for some purposes but their accuracy still shows room for improvement. By means of diathesis alternation, this paper proposes a new filtering method, which improved the performance of Korhonen’s acquisition system remarkably, with the precision increased to 91.18% and recall unchanged, making the acquired lexicon much more practical for further manual proofreading and other NLP uses. ...

    pdf6p hongvang_1 16-04-2013 19 2   Download

  • The Penn Treebank does not annotate within base noun phrases (NPs), committing only to flat structures that ignore the complexity of English NPs. This means that tools trained on Treebank data cannot learn the correct internal structure of NPs. This paper details the process of adding gold-standard bracketing within each noun phrase in the Penn Treebank. We then examine the consistency and reliability of our annotations. Finally, we use this resource to determine NP structure using several statistical approaches, thus demonstrating the utility of the corpus.

    pdf8p hongvang_1 16-04-2013 15 2   Download

  • Dependency analysis of natural language has gained importance for its applicability to NLP tasks. Non-projective structures are common in dependency analysis, therefore we need fine-grained means of describing them, especially for the purposes of machine-learning oriented approaches like parsing. We present an evaluation on twelve languages which explores several constraints and measures on non-projective structures. We pursue an edge-based approach concentrating on properties of individual edges as opposed to properties of whole trees. ...

    pdf8p hongvang_1 16-04-2013 28 2   Download

  • Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. ...

    pdf8p hongvang_1 16-04-2013 14 2   Download

  • Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named entities, the difference between PRC and Taiwan is the most prevalent and most challenging. In this paper, we introduce a novel approach to the automatic extraction of diverging transliterations of foreign named entities by bootstrapping cooccurrence statistics from tagged and segmented Chinese corpus. Preliminary experiment yields promising results and shows its potential in NLP applications. ...

    pdf4p hongvang_1 16-04-2013 18 2   Download

  • In machine learning, whether one can build a more accurate classifier by using unlabeled data (semi-supervised learning) is an important issue. Although a number of semi-supervised methods have been proposed, their effectiveness on NLP tasks is not always clear. This paper presents a novel semi-supervised method that employs a learning paradigm which we call structural learning.

    pdf9p bunbo_1 17-04-2013 16 2   Download

Đồng bộ tài khoản