Exploiting parallel texts

Xem 1-6 trên 6 kết quả Exploiting parallel texts
  • This paper discusses the use of statistical word alignment over multiple parallel texts for the identification of string spans that cannot be constituents in one of the languages. This information is exploited in monolingual PCFG grammar induction for that language, within an augmented version of the inside-outside algorithm. Besides the aligned corpus, no other resources are required. We discuss an implemented system and present experimental results with an evaluation against the Penn Treebank. ...

    pdf8p bunbo_1 17-04-2013 37 1   Download

  • Among the various approaches to WSD, the supervised learning approach is the most successful to date. In this approach, we first collect a corpus in which each occurrence of an ambiguous word w has been manually annotated with the correct sense, according to some existing sense inventory in a dictionary. This annotated corpus then serves as the training material for a learning algorithm. After training, a model is automatically learned and it is used to assign the correct sense to any previously unseen occurrence of w in a new context. ...

    pdf8p bunbo_1 17-04-2013 32 1   Download

  • Text in parallel translation is a valuable resource in natural language processing. Statistical methods in machine translation (e.g. (Brown et al., 1990)) typically rely on large quantities of bilingual text aligned at the document or sentence level, and a number of approaches in the burgeoning field of crosslanguage information retrieval exploit parallel corpora either in place of or in addition to mappings between languages based on information from bilingual dictionaries (Davis and Dunning, 1995; Landauer and Littman, 1990; Hull and Oard, 1997; Oard, 1997). ...

    pdf8p bunrieu_1 18-04-2013 33 2   Download

  • This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. ...

    pdf3p bunmoc_1 20-04-2013 22 1   Download

  • In adding syntax to statistical MT, there is a tradeoff between taking advantage of linguistic analysis, versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment.

    pdf9p hongphan_1 15-04-2013 27 1   Download

  • The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data.

    pdf6p nghetay_1 07-04-2013 28 1   Download



p_strKeyword=Exploiting parallel texts

nocache searchPhinxDoc


Đồng bộ tài khoản