In this paper, we present a new word alignment combination approach on language pairs where one language has no explicit word boundaries. Instead of combining word alignments of different models (Xiang et al., 2010), we try to combine word alignments over multiple monolingually motivated word segmentation. Our approach is based on link confidence score defined over multiple segmentations, thus the combined alignment is more robust to inappropriate word segmentation.
Previous work on the induction of selectional preferences has been mainly carried out for English and has concentrated almost exclusively on verbs and their direct objects. In this paper, we focus on class-based models of selectional preferences for German verbs and take into account not only direct objects, but also subjects and prepositional complements. We evaluate model performance against human judgments and show that there is no single method that overall performs best.
Immediately after the first drafts of the human genome sequence were reported almost
a decade ago, the importance of genomics and functional genomics studies became
well recognized across the broad disciplines of biological sciences research.
Biofuels such as bioethanol are becoming a viable alternative
to fossil fuels. Utilizing agricultural biomass for the production
of biofuel has drawn much interest in many science and
engineering disciplines. As one of the major crops, maize
offers promise in this regard. Compared to other crops with
biofuel potential, maize can provide both starch (seed) and
cellulosic (stover) material for bioethanol production.
Then, when a segment containing the prime in less than minimal combination is presented for identification, its location in cue space lies within a restricted number of units of within-cluster variance of the central location of the prime cluster. The number of such distance units determines headedness in the segment, with separate thresholds for occurrence as head and as operator. In § 3 we describe in more detail the stagewise procedure for identifying via quadratic discriminants the primes present in segments. ...
Making a Treatment Plan From information on the extent of disease and the prognosis and in conjunction with the patient's wishes, it is determined whether the treatment approach should be curative or palliative in intent. Cooperation among the various professionals involved in cancer treatment is of the utmost importance in treatment planning.
Functional Programming in C# leads you along a path that begins with the historic value of functional ideas. Inside, C# MVP and functional programming expert Oliver Sturm explains the details of relevant language features in C# and describes theory and practice of using functional techniques in C#, including currying, partial application, composition, memoization, and monads.
This report presents results of a three-year evaluation of Learn and Serve America, Higher Education (LSAHE), a program sponsored by the Corporation for National and Community Service that aims to increase involvement in community service by higher education institutions and students. LSAHE emphasizes an approach to
This paper revisits the pivot language approach for machine translation. First, we investigate three different methods for pivot translation. Then we employ a hybrid method combining RBMT and SMT systems to ﬁll up the data gap for pivot translation, where the sourcepivot and pivot-target corpora are independent. Experimental results on spoken language translation show that this hybrid method signiﬁcantly improves the translation quality, which outperforms the method using a source-target corpus of the same size. ...
The present paper describes a robust approach for abbreviating terms. First, in order to incorporate non-local information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model, or alternatively, the label encoding approach with global information. Although the two approaches compete with one another, we demonstrate that these approaches are also complementary. By combining these two approaches, experiments revealed that the proposed abbreviation generator achieved the best results for both the Chinese and English languages. ...
Inspired by previous preprocessing approaches to SMT, this paper proposes a novel, probabilistic approach to reordering which combines the merits of syntax and phrase-based SMT. Given a source sentence and its parse tree, our method generates, by tree operations, an n-best list of reordered inputs, which are then fed to standard phrase-based decoder to produce the optimal translation. Experiments show that, for the NIST MT-05 task of Chinese-toEnglish translation, the proposal leads to BLEU improvement of 1.56%. ...
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure.
The METAL machine translation project incorporates two methods of structural transfer - direct transfer and transfer by grammar. In this paper I discuss the strengths and weaknesses of these two approaches in general and with respect to the METAL project, and argue that, for many applications, a combination of the two is preferable to either alone.
In this work we propose methods to label probabilistic synchronous context-free grammar (PSCFG) rules using only word tags, generated by either part-of-speech analysis or unsupervised word class induction. The proposals range from simple tag-combination schemes to a phrase clustering model that can incorporate an arbitrary number of features. Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the NIST large Chineseto-English translation task. ...
Conventional sentence compression methods employ a syntactic parser to compress a sentence without changing its meaning. However, the reference compressions made by humans do not always retain the syntactic structures of the original sentences. Moreover, for the goal of ondemand sentence compression, the time spent in the parsing stage is not negligible.
This paper presents an innovative, complex approach to semantic verb classiﬁcation that relies on selectional preferences as verb properties. The probabilistic verb class model underlying the semantic classes is trained by a combination of the EM algorithm and the MDL principle, providing soft clusters with two dimensions (verb senses and subcategorisation frames with selectional preferences) as a result. A language-model-based evaluation shows that after 10 training iterations the verb class model results are above the baseline results.
This paper proposes to solve the bottleneck of finding training data for word sense disambiguation (WSD) in the domain of web queries, where a complete set of ambiguous word senses are unknown. In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only given. The novelty of our approach is to use “pseudo negative examples” with reliable confidence score estimated by a classifier trained with positive and unlabeled examples.
Over several years, we have developed an approach to spoken dialogue systems that includes rule-based and trainable dialogue managers, spoken language understanding and generation modules, and a comprehensive dialogue system architecture. We present a Reinforcement Learning-based dialogue system that goes beyond standard rule-based models and computes on-line decisions of the best dialogue moves. The key concept of this work is that we bridge the gap between manually written dialog models (e.g.
In this paper, we present a hybrid method for word segmentation and POS tagging. The target languages are those in which word boundaries are ambiguous, such as Chinese and Japanese. In the method, word-based and character-based processing is combined, and word segmentation and POS tagging are conducted simultaneously. Experimental results on multiple corpora show that the integrated method has high accuracy.
We present new statistical models for jointly labeling multiple sequences and apply them to the combined task of partof-speech tagging and noun phrase chunking. The model is based on the Factorial Hidden Markov Model (FHMM) with distributed hidden states representing partof-speech and noun phrase sequences. We demonstrate that this joint labeling approach, by enabling information sharing between tagging/chunking subtasks, outperforms the traditional method of tagging and chunking in succession.