Character n -grams

Xem 1-5 trên 5 kết quả Character n -grams
  • This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA). LHs are enriched histogram representations that preserve sequential information in documents; they have been successfully used for text categorization and document visualization using word histograms. In this work we explore the suitability of LHs over n-grams at the character-level for AA. We show that LHs are particularly helpful for AA, because they provide useful information for uncovering, to some extent, the writing style of authors.

    pdf11p hongdo_1 12-04-2013 50 2   Download

  • While OOV is always a problem for most languages in ASR, in the Chinese case the problem can be avoided by utilizing character n-grams and moderate performances can be obtained. However, character ngram has its own limitation and proper addition of new words can increase the ASR performance. Here we propose a discriminative lexicon adaptation approach for improved character accuracy, which not only adds new words but also deletes some words from the current lexicon.

    pdf9p hongphan_1 14-04-2013 43 2   Download

  • We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU, which we further augment with character-based transliteration at the word level and combine with a word-level translation model.

    pdf5p nghetay_1 07-04-2013 24 1   Download

  • We present a method for computerassisted authorship attribution based on character-level n-gram language models. Our approach is based on simple information theoretic principles, and achieves improved performance across a variety of languages without requiring extensive pre-processing or feature selection.

    pdf8p bunthai_1 06-05-2013 27 1   Download

  • In this paper, we present a stochastic language model for Japanese using dependency. The prediction unit in this model is all attribute of "bunsetsu". This is represented by the product of the head of content words and that of function words. The relation between the attributes of "bunsetsu" is ruled by a context-free grammar. The word sequences axe predicted from the attribute using word n-gram model. The spell of U n k n o w word is predicted using character n-grain model.

    pdf7p bunrieu_1 18-04-2013 35 2   Download


180 tài liệu
1241 lượt tải

p_strKeyword=Character n -grams

nocache searchPhinxDoc


Đồng bộ tài khoản