Machine translation to larger corpora

Xem 1-2 trên 2 kết quả Machine translation to larger corpora
  • In this paper we describe a novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations. We detail the computational complexity and average retrieval times for looking up phrase translations in our suffix array-based data structure. We show how sampling can be used to reduce the retrieval time by orders of magnitude with no loss in translation quality. ...

    pdf8p bunbo_1 17-04-2013 37 3   Download

  • A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it produces false positives with some quantifiable probability. Here we explore the use of BFs for language modelling in statistical machine translation. We show how a BF containing n-grams can enable us to use much larger corpora and higher-order models complementing a conventional n-gram LM within an SMT system.

    pdf8p hongvang_1 16-04-2013 34 1   Download



p_strKeyword=Machine translation to larger corpora

nocache searchPhinxDoc


Đồng bộ tài khoản