![](images/graphics/blank.gif)
Statistical natural language processing
-
Ebook "Foundations of statistical natural language processing" includes content: Lexical acquisition, introduction, mathematical foundations, linguistic essentials, corpus based work, collocations, statistical inference - n gram models over sparse data, word sense disambiguation,.... and other contents.
704p
haojiubujain07
20-09-2023
6
2
Download
-
Continued part 1, part 2 of ebook "Introduction to data science: A python approach to concepts, techniques and applications" has presents the following content: unsupervised learning; network analysis; recommender systems; statistical natural language processing for sentiment analysis; parallel computing;...
100p
dieptieuung
20-07-2023
11
6
Download
-
Ebook Artificial intelligence: Part 2 presents the following content: Symbolic reasoning under uncertainty; statistical reasoning; weak slot and filler structures; strong slot and filler structures; natural language processing;...Please refer to the documentation for more details.
119p
chankora
16-06-2023
7
3
Download
-
Part 2 of book "Speech and Language Processing: An introduction to natural language processing" provide with knowledge about: statistical parsing; language and complexity; features and unification; representing meaning; computational semantics; lexical semantics; computational lexical semantics; computational discourse; information extraction; question answering and summarization; dialogue and conversational agents;...
535p
britaikridanik
06-07-2022
30
3
Download
-
Lecture “Natural language processing – Chapter 5: Foundation of statistical machine translation” has contents: Introduction to statistical machine translation, statistical MT systems, three problems in statistical MT systems, translation model, and other contents.
12p
dien_vi01
21-11-2018
18
0
Download
-
One of crucial factors in the POS (Part-ofSpeech) tagging approaches based on the statistical method is the processing time. In this paper, we propose an approach to calculate the pruning threshold, which can apply into the Viterbi algorithm of Hidden Markov model for tagging the texts in the natural language processing. Experiment on the 1.000.000 words on the tag of the Wall Street Journal corpus showed that our proposed solution is satisfactory.
10p
cumeo3000
01-08-2018
27
0
Download
-
In this paper, we present an approach as pre-processing step based on a dependency parser in phrase-based statistical machine translation (SMT) to learn automatic and manual reordering rules from English to Vietnamese. The dependency parse trees and transformation rules are used to reorder the source sentences and applied for systems translating from English to Vietnamese. We evaluated our approach on English-Vietnamese machine translation tasks, and showed that it outperforms the baseline phrase-based SMT system.
14p
truongtien_09
10-04-2018
39
3
Download
-
In Data Oriented Parsing (DOP), an annotated corpus is used as a stochastic grammar. An input string is parsed by combining subtrees from the corpus. As a consequence, one parse tree can usually be generated by several derivations that involve different subtrces. This leads to a statistics where the probability of a parse is equal to the sum of the probabilities of all its derivations. In (Scha, 1990) an informal introduction to DOP is given, while (Bed, 1992a) provides a formalization of the theory. ...
8p
buncha_1
08-05-2013
46
1
Download
-
The work reported here has largely involved problems with parsing Italian. One of the typical features of Italian is a lower degree of word order rigidity in sentences. For instance, "Paolo ama Maria" (Paolo loves Maria) may be rewritten without any significant difference in meaning (leaving aside questions of context and pragmatics) in any the six possible permutations: Paolo ama Maria, Paolo Maria ama, Maria ama Paolo, Maria Paolo ama, ama Paolo Maria, ama Maria Paolo.
5p
buncha_1
08-05-2013
25
1
Download
-
This paper presents a partial solution to a component of the problem of lexical choice: choosing the synonym most typical, or expected, in context. We apply a new statistical approach to representing the context of a word through lexical co-occurrence networks. The implementation was trained and evaluated on a large corpus, and results show that the inclusion of second-order co-occurrence relations improves the performance of our implemented lexical choice program.
3p
bunthai_1
06-05-2013
61
3
Download
-
The information used for the extraction of terms can be considered as rather 'internal', i.e. coming from the candidate string itself. This paper presents the incorporation of 'external' information derived from the context of the candidate string. It is embedded to the C-value approach for automatic term recognition (ATR), in the form of weights constructed from statistical characteristics of the context words of the candidate string.
3p
bunthai_1
06-05-2013
63
2
Download
-
Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level . The model's precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical.
8p
bunthai_1
06-05-2013
47
3
Download
-
This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domainspecific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POStagging. ...
8p
bunthai_1
06-05-2013
48
2
Download
-
We present and experimentally evaluate a new model of pronunciation by analogy: the paradigmatic cascades model. Given a pronunciation lexicon, this algorithm first extracts the most productive paradigmatic mappings in the graphemic domain, and pairs them statistically with their correlate(s) in the phonemic domain. These mappings are used to search and retrieve in the lexical database the most promising analog of unseen words. We finally apply to the analogs pronunciation the correlated series of mappings in the phonemic domain to get the desired pronunciation. ...
8p
bunthai_1
06-05-2013
48
3
Download
-
This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-occurrence statistics to be constant over a window of several hundred words, we show that their influence is nonstationary on a much smaller time scale.
8p
bunthai_1
06-05-2013
47
6
Download
-
Decoding algorithm is a crucial part in statistical machine translation. We describe a stack decoding algorithm in this paper. We present the hypothesis scoring method and the heuristics used in our algorithm. We report several techniques deployed to improve the performance of the decoder. We also introduce a simplified model to moderate the sparse data problem and to speed up the decoding process. We evaluate and compare these techniques/models in our statistical machine translation system.
7p
bunthai_1
06-05-2013
54
5
Download
-
In this paper, we describe a Dynamic Programming (DP) based search algorithm for statistical translation and present experimental results. The statistical translation uses two sources of information: a translation model and a language model. The language model used is a standard bigram model. For the translation lnodel, the alignment probabilities are made dependent on the differences in the alignment positions rather than on the absolute positions.
8p
bunthai_1
06-05-2013
30
2
Download
-
To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved early in the processing stream. We put forward a statistical language model that resolves these problems, does POS tagging, and can be used as the language model of a speech recognizer.
8p
bunthai_1
06-05-2013
56
5
Download
-
Concerning different approaches to automatic PoS tagging: EngCG-2, a constraintbased morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The experiments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed. ...
8p
bunthai_1
06-05-2013
48
3
Download
-
We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree: n-grams, automatically learned context constraints, linguistically motivated manually written constraints, etc. The sources and kinds of constraints are unrestricted, and the language model can be easily extended, improving the results. The tagger has been tested and evaluated on the WSJ corpus. ...
8p
bunthai_1
06-05-2013
48
4
Download
CHỦ ĐỀ BẠN MUỐN TÌM
![](images/graphics/blank.gif)