Homophone are words of the same language that are pronounced alike even of they differ in spelling, meaning, or origin, such as " pair", and " pear". Homophones may also bi spelled alike, as in " bear" ( the animal) and " bear" ( to carry).
Other common homophones are write and right, meet and meat, peace and piece. You have to listen to the context to know which word someone means if they’re spoken aloud. If they say they like your jeans (genes?), they’re probably talking about your pants and not your height and eye color — but you’d...
In 1992, one of those future teachers was still toiling in the
orchards and fields of Central Washington, struggling to
learn English, and dreaming of a return to teaching. Alfonso
Lopez was born in a small village in Oaxaca, Mexico. By the
time he arrived in Wenatchee in his mid-20s, he had already
struggled through more adversity than many people face in a
lifetime. The son of poor farmers, he managed to attend col-
lege and earn his teaching degree and later a master’s degree
in social science. Lopez taught for five years in rural schools
This book got its start as an experiment in modern technology. When I started teaching
at my present university (1998), the organization and architecture course focused on the 8088
running MS-DOS—essentially a programming environment as old as the sophomores taking
the class. (This temporal freezing is unfortunately fairly common; when I took the same class
during my undergraduate days, the computer whose architecture I studied was only two years
younger than I was.
We investigate the empirical behavior of ngram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modiﬁed Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, resulting in perplexity improvements over modiﬁed Kneser-Ney and Jelinek-Mercer baselines. ...
This paper revisits the pivot language approach for machine translation. First, we investigate three different methods for pivot translation. Then we employ a hybrid method combining RBMT and SMT systems to ﬁll up the data gap for pivot translation, where the sourcepivot and pivot-target corpora are independent. Experimental results on spoken language translation show that this hybrid method signiﬁcantly improves the translation quality, which outperforms the method using a source-target corpus of the same size. ...
We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context. We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours.
Efﬁcient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on both phrase-based and syntax-based MT systems. In both cases, our methods achieve signiﬁcant speed improvements, often by more than a factor of ten, over the conventional beam-search method at the same levels of search error and translation accuracy. ...
The previous probabilistic part-of-speech tagging models for agglutinative languages have considered only lexical forms of morphemes, not surface forms of words. This causes an inaccurate calculation of the probability. The proposed model is based on the observation that when there exist words (surface forms) that share the same lexical forms, the probabilities to appear are different from each other. Also, it is designed to consider lexical form of word. By experiments, we show that the proposed model outperforms the bigram Hidden Markov model (HMM)-based tagging model.
Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation directions using the same training data. We investigate information retrieval between English and French, incorporating both translations directions into both document translation and query translation-based information retrieval, as well as into hybrid systems. ...
The undisputed favorite application for natural language interfaces has been data base query. Why? The reasons range from the relative simplicity of the task, including shallow semantic processing, to the potential real-world utility of the resultant system. Because of such reasons, the data base query task was an excellent paradigmatic problem for computational linguistics, and for the very same reasons it is now time for the field to abandon its protective cocoon and progress beyond this rather limiting task. ...
Do natural language database systems still ,~lovide a valuable environment for further work on n~,tural language processing? Are there other systems which provide the same hard environment :for testing, but allow us to explore more interesting natural language questions? In order to answer , o to the first question and yes to the second (the position taken by our panel's chair}, there must be an interesting language problem which is more naturally studied in some other system than in the database system. ...
We propose a novel algorithm for extracting dependencies from the derivations of a large fragment of CCG. Unlike earlier proposals, our dependency structures are always tree-shaped. We then use these dependency trees to compare the strong generative capacities of CCG and TAG and obtain surprising results: Both formalisms generate the same languages of derivation trees – but the mechanisms they use to bring the words in these trees into a linear order are incomparable.
In the past the evaluation of machine translation systems has focused on single system evaluations because there were only few systems available. But now there are several commercial systems for the same language pair. This requires new methods of comparative evaluation. In the paper we propose a black-box method for comparing the lexical coverage of MT systems. The method is based on lists of words from different frequency classes. It is shown how these word lists can be compiled and used for testing. We also present the results of using our method on 6 MT systems that translate...
Due to the historical and cultural reasons, English phases, especially the proper nouns and new words, frequently appear in Web pages written primarily in Asian languages such as Chinese and Korean. Although these English terms and their equivalences in the Asian languages refer to the same concept, they are erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and suggests a novel technique to solve it....
This paper presents a discriminative pruning method of n-gram language model for Chinese word segmentation. To reduce the size of the language model that is used in a Chinese word segmentation system, importance of each bigram is computed in terms of discriminative pruning criterion that is related to the performance loss caused by pruning the bigram. Then we propose a step-by-step growing algorithm to build the language model of desired size.
Partition-based morphology is an approach of ﬁnite-state morphology where a grammar describes a special kind of regular relations, which split all the strings of a given tuple into the same number of substrings. They are compiled in ﬁnite-state machines. In this paper, we address the question of merging grammars using different partitionings into a single ﬁnite-state machine. A morphological description may then be obtained by parallel or sequential application of constraints expressed on different partition notions (e.g. morpheme, phoneme, grapheme). ...
We have established a phonotactic language model as the solution to spoken language identification (LID). In this framework, we define a single set of acoustic tokens to represent the acoustic activities in the world’s spoken languages. A voice tokenizer converts a spoken document into a text-like document of acoustic tokens. Thus a spoken document can be represented by a count vector of acoustic tokens and token n-grams in the vector space.
The n-gram model is a stochastic model, which predicts the next word (predicted word) given the previous words (conditional words) in a word sequence. The cluster n-gram model is a variant of the n-gram model in which similar words are classified in the same cluster. It has been demonstrated that using different clusters for predicted and conditional words leads to cluster models that are superior to classical cluster models which use the same clusters for both words. This is the basis of the asymmetric cluster model (ACM) discussed in our study. ...
In this paper, we explore statistical language modelling for a speech-enabled MP3 player application by generating a corpus from the interpretation grammar written for the application with the Grammatical Framework (GF) (Ranta, 2004). We create a statistical language model (SLM) directly from our interpretation grammar and compare recognition performance of this model against a speech recognition grammar compiled from the same GF interpretation grammar.
Since 1996, ASP programmers have faced one upgrade after another, often with no
extremely visible advantages until version 3.x—it’s been quite a wild ride. Now we
have the first significant improvement in ASP programming within our grasp—
ASP.NET.Our reliance on a watered-down version of Visual Basic has been alleviated
now that ASP.NET pages may be programmed in both Microsoft’s new and
more powerful version of Visual Basic or the latest version of C++: C#, which is
more Web friendly.ASP.NET allows programmers and developers to work with both