intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Statistical natural language processing

Xem 1-20 trên 114 kết quả Statistical natural language processing
  • Ebook "Foundations of statistical natural language processing" includes content: Lexical acquisition, introduction, mathematical foundations, linguistic essentials, corpus based work, collocations, statistical inference - n gram models over sparse data, word sense disambiguation,.... and other contents.

    pdf704p haojiubujain07 20-09-2023 6 2   Download

  • Continued part 1, part 2 of ebook "Introduction to data science: A python approach to concepts, techniques and applications" has presents the following content: unsupervised learning; network analysis; recommender systems; statistical natural language processing for sentiment analysis; parallel computing;...

    pdf100p dieptieuung 20-07-2023 11 6   Download

  • Ebook Artificial intelligence: Part 2 presents the following content: Symbolic reasoning under uncertainty; statistical reasoning; weak slot and filler structures; strong slot and filler structures; natural language processing;...Please refer to the documentation for more details.

    pdf119p chankora 16-06-2023 7 3   Download

  • Part 2 of book "Speech and Language Processing: An introduction to natural language processing" provide with knowledge about: statistical parsing; language and complexity; features and unification; representing meaning; computational semantics; lexical semantics; computational lexical semantics; computational discourse; information extraction; question answering and summarization; dialogue and conversational agents;...

    pdf535p britaikridanik 06-07-2022 30 3   Download

  • Lecture “Natural language processing – Chapter 5: Foundation of statistical machine translation” has contents: Introduction to statistical machine translation, statistical MT systems, three problems in statistical MT systems, translation model, and other contents.

    pdf12p dien_vi01 21-11-2018 18 0   Download

  • One of crucial factors in the POS (Part-ofSpeech) tagging approaches based on the statistical method is the processing time. In this paper, we propose an approach to calculate the pruning threshold, which can apply into the Viterbi algorithm of Hidden Markov model for tagging the texts in the natural language processing. Experiment on the 1.000.000 words on the tag of the Wall Street Journal corpus showed that our proposed solution is satisfactory.

    pdf10p cumeo3000 01-08-2018 27 0   Download

  • In this paper, we present an approach as pre-processing step based on a dependency parser in phrase-based statistical machine translation (SMT) to learn automatic and manual reordering rules from English to Vietnamese. The dependency parse trees and transformation rules are used to reorder the source sentences and applied for systems translating from English to Vietnamese. We evaluated our approach on English-Vietnamese machine translation tasks, and showed that it outperforms the baseline phrase-based SMT system.

    pdf14p truongtien_09 10-04-2018 39 3   Download

  • In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrces. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bed, 1992a) provides a formalization of the theory. ...

    pdf8p buncha_1 08-05-2013 46 1   Download

  • The work reported here has largely involved problems with parsing Italian. One of the typical features of Italian is a lower degree of word order rigidity in sentences. For instance, "Paolo ama Maria" (Paolo loves Maria) may be rewritten without any significant difference in meaning (leaving aside questions of context and pragmatics) in any the six possible permutations: Paolo ama Maria, Paolo Maria ama, Maria ama Paolo, Maria Paolo ama, ama Paolo Maria, ama Maria Paolo.

    pdf5p buncha_1 08-05-2013 25 1   Download

  • This paper presents a partial solution to a component of the problem of lexical choice: choosing the synonym most typical, or expected, in context. We apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. The implementation was trained and evaluated on a large corpus, and results show that the inclusion of second-order co-occurrence relations improves the performance of our implemented lexical choice program.

    pdf3p bunthai_1 06-05-2013 61 3   Download

  • The information used for the extraction of terms can be considered as rather 'internal', i.e. coming from the candidate string itself. This paper presents the incorporation of 'external' information derived from the context of the candidate string. It is embedded to the C-value approach for automatic term recognition (ATR), in the form of weights constructed from statistical characteristics of the context words of the candidate string.

    pdf3p bunthai_1 06-05-2013 63 2   Download

  • Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical.

    pdf8p bunthai_1 06-05-2013 47 3   Download

  • This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domainspecific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POStagging. ...

    pdf8p bunthai_1 06-05-2013 48 2   Download

  • We present and experimentally evaluate a new model of pronunciation by analogy: the paradigmatic cascades model. Given a pronunciation lexicon, this algorithm first extracts the most productive paradigmatic mappings in the graphemic domain, and pairs them statistically with their correlate(s) in the phonemic domain. These mappings are used to search and retrieve in the lexical database the most promising analog of unseen words. We finally apply to the analogs pronunciation the correlated series of mappings in the phonemic domain to get the desired pronunciation. ...

    pdf8p bunthai_1 06-05-2013 48 3   Download

  • This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale.

    pdf8p bunthai_1 06-05-2013 47 6   Download

  • Decoding algorithm is a crucial part in statistical machine translation. We describe a stack decoding algorithm in this paper. We present the hypothesis scoring method and the heuristics used in our algorithm. We report several techniques deployed to improve the performance of the decoder. We also introduce a simplified model to moderate the sparse data problem and to speed up the decoding process. We evaluate and compare these techniques/models in our statistical machine translation system.

    pdf7p bunthai_1 06-05-2013 54 5   Download

  • In this paper, we describe a Dynamic Programming (DP) based search algorithm for statistical translation and present experimental results. The statistical translation uses two sources of information: a translation model and a language model. The language model used is a standard bigram model. For the translation lnodel, the alignment probabilities are made dependent on the differences in the alignment positions rather than on the absolute positions.

    pdf8p bunthai_1 06-05-2013 30 2   Download

  • To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved early in the processing stream. We put forward a statistical language model that resolves these problems, does POS tagging, and can be used as the language model of a speech recognizer.

    pdf8p bunthai_1 06-05-2013 56 5   Download

  • Concerning different approaches to automatic PoS tagging: EngCG-2, a constraintbased morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The experiments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed. ...

    pdf8p bunthai_1 06-05-2013 48 3   Download

  • We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree: n-grams, automatically learned context constraints, linguistically motivated manually written constraints, etc. The sources and kinds of constraints are unrestricted, and the language model can be easily extended, improving the results. The tagger has been tested and evaluated on the WSJ corpus. ...

    pdf8p bunthai_1 06-05-2013 48 4   Download

CHỦ ĐỀ BẠN MUỐN TÌM

ADSENSE

nocache searchPhinxDoc

 

Đồng bộ tài khoản
2=>2