Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors.
The Oxford Monographs on Criminal Law and Justice series covers all aspects of criminal law and procedure including criminal evidence. The scope of the series is wide, encompassing both practical and theoretical works.
This volume is a thematic collection of essays on sentencing theory by leading writers.
We extend the original entity-based coherence model (Barzilay and Lapata, 2008) by learning from more ﬁne-grained coherence preferences in training data. We associate multiple ranks with the set of permutations originating from the same source document, as opposed to the original pairwise rankings. We also study the eﬀect of the permutations used in training, and the eﬀect of the coreference component used in entity extraction.
Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain.
Prior approaches to sentence compression have taken low level syntactic constraints into account in order to maintain grammaticality. We propose and successfully evaluate a more comprehensive, generalizable feature set that takes syntactic and structural relationships into account in order to sustain variable compression rates while making compressed sentences more coherent, grammatical and readable.
In general, a certain range of sentences in a text, is widely assumed to form a coherent unit which is called a discourse segment. Identifying the segment boundaries is a first step to recognize the structure of a text. In this paper, we describe a method for identifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues, though our experiments might be small-scale. We also present a method of training the weights for multiple linguistic cues automatically without the overfitting problem. ...
Researchers in both machine Iranslation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts, based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual sample of Swiss economic reports.
We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics. INTRODUCTION Spontaneous spoken language often includes speech that is not intended by the speaker to be part of the content of the utterance. ...
We present an algorithm for simultaneously constructing both the syntax and semantics of a sentence using a Lexicalized Tree Adjoining Grammar (LTAG). This approach captures naturally and elegantly the interaction between pragmatic and syntactic constraints on descriptions in a sentence, and the inferential interactions between multiple descriptions in a sentence. At the same time, it exploits linguistically motivated, declarative specifications of the discourse functions of syntactic constructions to make contextually appropriate syntactic choices. ...
THE UNIVERSITY OF MICHIGAN undertook research, late in 1955, in the analysis of language structure for mechanical translation. Emphasis was placed on the use of the contextual structure of the sentence as a means of reducing ambiguity and on the formulation of a set of operative rules which an electronic computer could use for automatically translating Russian texts into English.
This paper proposes a novel method that exploits multiple resources to improve statistical machine translation (SMT) based paraphrasing. In detail, a phrasal paraphrase table and a feature function are derived from each resource, which are then combined in a log-linear SMT model for sentence-level paraphrase generation. Experimental results show that the SMT-based paraphrasing model can be enhanced using multiple resources. The phrase-level and sentence-level precision of the generated paraphrases are above 60% and 55%, respectively.
Although most NLP researchers agree that a level of "logical form" is a necessary step toward the goal of representing the meaning of a sentence, few people agree on the content and form of this level of representation. An even smaller number of people have considered the complex action sentences that are often expressed in taskoriented dialogues. Most existing logical form representations have been developed for single-clause sentences that express assertions about properties or actual actions and in which time is not a main concern.
We describe a word alignment platform which ensures text pre-processing (tokenization, POS-tagging, lemmatization, chunking, sentence alignment) as required by an accurate word alignment. The platform combines two different methods, producing distinct alignments. The basic word aligners are described in some details and are individually evaluated. The union of the individual alignments is subject to a filtering postprocessing phase. Two different filtering methods are also presented. The evaluation shows that the combined word alignment contains 10.
This paper describes a computational model of human sentence processing based on the principles and parameters paradigm of current linguistic theory. The syntactic processing model posits four modules, recovering phrase structure, long-distance dependencies, coreference, and thematic structure. These four modules are implemented as recta-interpreters over their relevant components of the grammar, permitting variation in the deductive strategies employed by each module.
This compact TOEIC Test Package is the only printable TOEIC word test collection currently available on the Internet. But unlike the traditional TOEIC text books, your TOEIC Test Package provides you with a system that constantly monitors your learning progress and improves your TOEIC language skills. You will find it fun taking these unique multiple choice tests because with every question you answer correctly your TOEIC English improves. Some of the test questions you will find easy while others might contain new phrases and expressions.
Most states have, or are developing, tests to assess their students’ proficiency in state frameworks of curriculum. Many of these states are including students with limited English proficiency in this assessment process, but a significant number of LEP students have difficulty passing these standardized tests. In this website, Longman is pleased to provide additional practice for LEP students by offering sample standardized reading tests for grades 1 to 8. The reading tests provided here are a combination of multiple choice, short-answer, and long-answer questions.
This exercise lets you review some of the more common uses of 'grammar'-type words
(prepositions, conjunctions, pronouns, prepositions, etc) in context. Use one word to complete
each gap in the sentences. In some cases, there may be more than one alternative answer, but
you should just give one of them.
The parallelism between the Arabic and English sentences is quite clear
in the learners' errors. The two examples above demonstrate that in the first
sentence the students dropped verb to be, while in the second one, they used
verb to be but deleted the indefinite article. This fact supports the students’
comments that they know the grammatical rules that underlie the deviant
sentences they have produced, but because of their reliance on their native
language, they have produced these errors. ...
This book addresses state-of-the-art systems and achievements in various topics in the research field of speech and language technologies. Book chapters are organized in different sections covering diverse problems, which have to be solved in speech recognition and language understanding systems. In the first section machine translation systems based on large parallel corpora using rule-based and statistical-based translation methods are presented.