Unknown word

Xem 1-20 trên 44 kết quả Unknown word
  • We present a statistical model of Japanese unknown words consisting of a set of length and spelling models classified by the character types that constitute a word. The point is quite simple: different character sets should be treated differently and the changes between character types are very important because Japanese script has both ideograms like Chinese (kanji) and phonograms like English (katakana). Both word segmentation accuracy and part of speech tagging accuracy are improved by the proposed model. ...

    pdf8p bunrieu_1 18-04-2013 28 4   Download

  • This paper examines the feasibility of using statistical methods to train a part-of-speech predictor for unknown words. By using statistical methods, without incorporating hand-crafted linguistic information, the predictor could be used with any language for which there is a large tagged training corpus. Encouraging results have been obtained by testing the predictor on unknown words from the Brown corpus. The relative value of information sources such as affixes and context is discussed.

    pdf3p bunrieu_1 18-04-2013 29 3   Download

  • This paper presents a decision-tree approach to the problems of part-ofspeech disambiguation and unknown word guessing as they appear in Modem Greek, a highly inflectional language. The learning procedure is tag-set independent and reflects the linguistic reasoning on the specific problems. The decision trees induced are combined with a highcoverage lexicon to form a tagger that achieves 93,5% overall disambiguation accuracy.

    pdf8p bunthai_1 06-05-2013 32 3   Download

  • In this paper, we present a method for guessing POS tags of unknown words using local and global information. Although many existing methods use only local information (i.e. limited window size or intra-sentential features), global information (extra-sentential features) provides valuable clues for predicting POS tags of unknown words. We propose a probabilistic model for POS guessing of unknown words using global information as well as local information, and estimate its parameters using Gibbs sampling.

    pdf8p hongvang_1 16-04-2013 31 2   Download

  • We propose a collaborative framework for collecting Thai unknown words found on Web pages over the Internet. Our main goal is to design and construct a Webbased system which allows a group of interested users to participate in constructing a Thai unknown-word open dictionary. The proposed framework provides supporting algorithms and tools for automatically identifying and extracting unknown words from Web pages of given URLs. The system yields the result of unknownword candidates which are presented to the users for verification. ...

    pdf8p hongvang_1 16-04-2013 29 2   Download

  • This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a particular length and POS category.

    pdf6p bunbo_1 17-04-2013 29 2   Download

  • Morphological disambiguation proceeds in 2 stages: (1) an analyzer provides all possible analyses for a given token and (2) a stochastic disambiguation module picks the most likely analysis in context. When the analyzer does not recognize a given token, we hit the problem of unknowns. In large scale corpora, unknowns appear at a rate of 5 to 10% (depending on the genre and the maturity of the lexicon). We address the task of computing the distribution p(t|w) for unknown words for full morphological disambiguation in Hebrew. ...

    pdf9p hongphan_1 15-04-2013 30 1   Download

  • This paper presents an approach to text categorization that i) uses no machine learning and ii) reacts on-the-fly to unknown words. These features are important for categorizing Blog articles, which are updated on a daily basis and filled with newly coined words. We categorize 600 Blog articles into 12 domains. As a result, our categorization method achieved an accuracy of 94.0% (564/600).

    pdf4p hongphan_1 15-04-2013 37 1   Download

  • This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and Chen 1994, Huang, Hong and Chen 1994, Chen and Chen 2000). This paper does not address proper nouns, focusing rather on common nouns, adjectives, and verbs. ...

    pdf8p bunbo_1 17-04-2013 32 1   Download

  • Since written Chinese has no space to delimit words, segmenting Chinese texts becomes an essential task. During this task, the problem of unknown word occurs. It is impossible to register all words in a dictionary as new words can always be created by combining characters. We propose a unified solution to detect unknown words in Chinese texts. First, a morphological analysis is done to obtain initial segmentation and POS tags and then a chunker is used to detect unknown words.

    pdf4p bunbo_1 17-04-2013 27 1   Download

  • Humans know a great deal about relationships among words. This paper discusses relationships among word pronunciations. We describe a computer system which models human judgement of rhyme by assigning specific roles to the location of primary stress, the similarity of phonetic segments, and other factors. By using the model as an experimental tool, we expect to improve our understanding of rhyme. A related computer model will attempt to generate pronunciations for unknown words by analogy with those for known words. ...

    pdf7p bungio_1 03-05-2013 36 1   Download

  • This paper describes a method of analysing words through morphological decomposition when the lexicon is incomplete. The method is used within a text-to-speech system to help generate pronunciations of unknown words. The method is achieved within a general morphological analyser system using Koskenniemi twolevel rules.

    pdf6p buncha_1 08-05-2013 33 1   Download

  • In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an errordriven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus.

    pdf9p hongphan_1 14-04-2013 26 3   Download

  • The omnipresence of unknown words is a problem that any NLP component needs to address in some form. While there exist many established techniques for dealing with unknown words in the realm of POS-tagging, for example, guessing unknown words’ semantic properties is a less-explored area with greater challenges. In this paper, we study the semantic field of sentiment and propose five methods for assigning prior sentiment polarities to unknown words based on known sentiment carriers.

    pdf4p hongphan_1 15-04-2013 41 2   Download

  • The limited coverage of lexical-semantic resources is a significant problem for NLP systems which can be alleviated by automatically classifying the unknown words. Supersense tagging assigns unknown nouns one of 26 broad semantic categories used by lexicographers to organise their manual insertion into W ORD N ET. Ciaramita and Johnson (2003) present a tagger which uses synonym set glosses as annotated training examples. We describe an unsupervised approach, based on vector-space similarity, which does not require annotated examples but significantly outperforms their tagger. ...

    pdf8p bunbo_1 17-04-2013 33 2   Download

  • Guess the meaning of unknown words. Use the context and your knowledge of prefixes and suffixes and word families to do this.Choose five key words and write them in your reading log. These should be unknown words that are important for understanding the meaning of the passage.

    pdf10p kahty209 20-08-2010 128 45   Download

  • Read without stopping. Do not stop to look up unknown words. You can understand the general idea of a passage without understanding every word. Underline or highlight unknown words, or write them on a separate piece of paper. Guess the meaning of unknown words. Use the context and your knowledge of prefixes and suffixes and word families to do this.

    pdf10p kahty209 20-08-2010 98 30   Download

  • Choose five key words and write them in your reading log. These should be unknown words that are important for understanding the meaning of the passage.Look up the five key words in your dictionary.Write a one-paragraph summary of the passage. Try to use the five key words in your summary.

    pdf10p kahty209 20-08-2010 104 24   Download

  • Read without stopping. Do not stop to look up unknown words. You can understand the general idea of a passage without understanding every word. Underline or highlight unknown words, or write them on a separate piece of paper

    pdf10p kahty209 20-08-2010 62 19   Download

  • We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing outof-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4% overall and 81% on unknown histories.

    pdf5p hongdo_1 12-04-2013 31 2   Download



p_strKeyword=Unknown word

nocache searchPhinxDoc


Đồng bộ tài khoản