Xem 1-20 trên 172 kết quả Bilingualism
  • This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, morphological segmentation) while learning a morpheme segmentation over the target language. Our model outperforms a competitive word alignment system in alignment quality. ...

    pdf10p hongdo_1 12-04-2013 17 4   Download

  • Most previous work on multilingual sentiment analysis has focused on methods to adapt sentiment resources from resource-rich languages to resource-poor languages. We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data. We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language. ...

    pdf11p hongdo_1 12-04-2013 14 3   Download

  • Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to learn an interlingual representation, making them sensitive to the domain of the training data. In this paper, we learn an interlingual representation in an unsupervised manner using only a bilingual dictionary.

    pdf6p hongdo_1 12-04-2013 18 3   Download

  • Mining bilingual data (including bilingual sentences and terms1) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval. In this paper, based on the observation that bilingual data in many web pages appear collectively following similar patterns, an adaptive pattern-based bilingual data mining method is proposed.

    pdf9p hongphan_1 14-04-2013 14 3   Download

  • The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their linguistics-based knowledge.

    pdf4p bunbo_1 17-04-2013 19 3   Download

  • This paper studies and evaluates disambiguation strategies for the translation of tense between German and English, using a bilingual corpus of appointment scheduling dialogues. It describes a scheme to detect complex verb predicates based on verb form subcategorization and grammatical knowledge. The extracted verb and tense information is presented and the role of different context factors is discussed.

    pdf5p bunrieu_1 18-04-2013 7 3   Download

  • Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent corpus for each language.

    pdf10p hongdo_1 12-04-2013 13 2   Download

  • Recent work on bilingual Word Sense Disambiguation (WSD) has shown that a resource deprived language (L1 ) can benefit from the annotation work done in a resource rich language (L2 ) via parameter projection. However, this method assumes the presence of sufficient annotated data in one resource rich language which may not always be possible. Instead, we focus on the situation where there are two resource deprived languages, both having a very small amount of seed annotated data and a large amount of untagged data. ...

    pdf9p hongdo_1 12-04-2013 11 2   Download

  • This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference.

    pdf10p hongdo_1 12-04-2013 15 2   Download

  • Cross-language document summarization is defined as the task of producing a summary in a target language (e.g. Chinese) for a set of documents in a source language (e.g. English). Existing methods for addressing this task make use of either the information from the original documents in the source language or the information from the translated documents in the target language. In this study, we propose to use the bilingual information from both the source and translated documents for this task. ...

    pdf10p hongdo_1 12-04-2013 14 2   Download

  • We study in this paper the problem of enhancing the comparability of bilingual corpora in order to improve the quality of bilingual lexicons extracted from comparable corpora. We introduce a clustering-based approach for enhancing corpus comparability which exploits the homogeneity feature of the corpus, and finally preserves most of the vocabulary of the original corpus.

    pdf6p hongdo_1 12-04-2013 13 2   Download

  • This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language. A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation. Using an ensemble method, the key information extracted from word pairs with dependency relations in the translated text is effectively integrated into the parser for the target language.

    pdf9p hongphan_1 14-04-2013 11 2   Download

  • This paper proposes a novel framework called bilingual co-training for a largescale, accurate acquisition method for monolingual semantic knowledge. In this framework, we combine the independent processes of monolingual semanticknowledge acquisition for two languages using bilingual resources to boost performance. We apply this framework to largescale hyponymy-relation acquisition from Wikipedia.

    pdf9p hongphan_1 14-04-2013 20 2   Download

  • Web search quality can vary widely across languages, even for the same information need. We propose to exploit this variation in quality by learning a ranking function on bilingual queries: queries that appear in query logs for two languages but represent equivalent search interests. For a given bilingual query, along with corresponding monolingual query log and monolingual ranking, we generate a ranking on pairs of documents, one from each language. Then we learn a linear ranking function which exploits bilingual features on pairs of documents, as well as standard monolingual features. ...

    pdf9p hongphan_1 14-04-2013 18 2   Download

  • In this paper we introduce a bilingual dictionary generating tool that does not use any large bilingual corpora. With this tool we implement our novel pivot based bilingual dictionary generation method that uses mainly the WordNet of the pivot language to build a new bilingual dictionary. We propose the usage of WordNet for good accuracy, introducing also a double directional selection method with local thresholds to maximize recall.

    pdf4p hongphan_1 15-04-2013 22 2   Download

  • We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons. and polysemy problems.

    pdf8p bunbo_1 17-04-2013 14 2   Download

  • This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology extraction, where terms are identified in one language and guessed, through the alignment process, in the other one, is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision, demonstrating the validity of the model. ...

    pdf7p bunrieu_1 18-04-2013 19 2   Download

  • A m e t h o d is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed m e t h o d only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences.

    pdf8p bunrieu_1 18-04-2013 21 2   Download

  • Text in parallel translation is a valuable resource in natural language processing. Statistical methods in machine translation (e.g. (Brown et al., 1990)) typically rely on large quantities of bilingual text aligned at the document or sentence level, and a number of approaches in the burgeoning field of crosslanguage information retrieval exploit parallel corpora either in place of or in addition to mappings between languages based on information from bilingual dictionaries (Davis and Dunning, 1995; Landauer and Littman, 1990; Hull and Oard, 1997; Oard, 1997). ...

    pdf8p bunrieu_1 18-04-2013 18 2   Download

  • We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular, we rebuild the bilingual dictionary by clustering similar words in a language and assigning them a higher cooccurrence score with a given word in the other language than each single word would have otherwise.

    pdf8p bunmoc_1 20-04-2013 15 2   Download


Đồng bộ tài khoản