intTypePromotion=1
ADSENSE

Chinese unknown words

Xem 1-7 trên 7 kết quả Chinese unknown words
  • This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a particular length and POS category.

    pdf6p bunbo_1 17-04-2013 29 2   Download

  • This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and Chen 1994, Huang, Hong and Chen 1994, Chen and Chen 2000). This paper does not address proper nouns, focusing rather on common nouns, adjectives, and verbs. ...

    pdf8p bunbo_1 17-04-2013 29 1   Download

  • Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.

    pdf4p bunbo_1 17-04-2013 25 1   Download

  • We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word. The point is quite simple: different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana). Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. ...

    pdf8p bunrieu_1 18-04-2013 27 4   Download

  • In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an errordriven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus.

    pdf9p hongphan_1 14-04-2013 26 3   Download

  • Chinese abbreviations are widely used in modern Chinese texts. Compared with English abbreviations (which are mostly acronyms and truncations), the formation of Chinese abbreviations is much more complex. Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated.

    pdf9p hongphan_1 15-04-2013 32 1   Download

  • Trong bài báo này, ccác tác giả sẽ trình bày một phương pháp lai, kết hợp luật và thống kê, để dịch lại các UKW dạng thực thể có tên biểu thức số. Áp dụng phương pháp này vào trong hệ dịch thống kê Hoa-Việt, kết quả thử nghiệm cho thấy phương pháp của chúng tôi đã cải tiến đáng kể hiệu suất dịch máy thống kê Hoa-Việt.

    pdf12p binhminhmuatrenngondoithonggio 09-06-2017 32 1   Download

CHỦ ĐỀ BẠN MUỐN TÌM

ADSENSE

p_strKeyword=Chinese unknown words
p_strCode=chineseunknownwords

nocache searchPhinxDoc

 

Đồng bộ tài khoản