Derived words

Xem 1-20 trên 92 kết quả Derived words
  • We present a stochastic finite-state model for segmenting Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on a single segmentation.

    pdf8p bunmoc_1 20-04-2013 11 1   Download

  • In this paper a morphological component with a limited capability to automatically interpret (and generate) derived words is presented. The system combines an extended two-level morphology [Trost, 1991a; Trost, 1991b] with a feature-based word grammar building on a hierarchical lexicon. Polymorphemic stems not explicitly stored in the lexicon are given a compositional interpretation.

    pdf9p buncha_1 08-05-2013 18 1   Download

  • The existence of words is usually taken for granted by the speakers of a language. To speak and understand a language means - among many other things - knowing the words of that language. The average speaker knows thousands of words, and new words enter our minds and our language on a daily basis.

    pdf264p phongvan_du 01-01-2010 1377 669   Download

  • Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any pait of it without McGraw-Hill's prior consent.

    pdf10p kathy205 29-07-2010 55 6   Download

  • This work investigates supervised word alignment methods that exploit inversion transduction grammar (ITG) constraints. We consider maximum margin and conditional likelihood objectives, including the presentation of a new normal form grammar for canonicalizing derivations. Even for non-ITG sentence pairs, we show that it is possible learn ITG alignment models by simple relaxations of structured discriminative learning objectives. For efficiency, we describe a set of pruning techniques that together allow us to align sentences two orders of magnitude faster than naive bitext CKY parsing.

    pdf9p hongphan_1 14-04-2013 15 4   Download

  • In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class.

    pdf10p hongdo_1 12-04-2013 15 3   Download

  • This work introduces a new approach to checking treebank consistency. Derivation trees based on a variant of Tree Adjoining Grammar are used to compare the annotation of word sequences based on their structural similarity. This overcomes the problems of earlier approaches based on using strings of words rather than tree structure to identify the appropriate contexts for comparison. We report on the result of applying this approach to the Penn Arabic Treebank and how this approach leads to high precision of error detection. ...

    pdf6p hongdo_1 12-04-2013 10 2   Download

  • Automatic acquisition of translation rules from parallel sentence-aligned text takes a variety of forms. Some machine translation (MT) systems treat aligned sentences as unstructured word sequences. Other systems, including our own ((Grishman, 1994) and (Meyers et al., 1996)), syntactically analyze sentences (parse) before acquiring transfer rules (cf. (Kaji et hi., 1992), (Matsumoto et hi., 1993), and (Kitamura and Matsumoto, 1995)). This has the advantage of acquiring structural as well as lexical correspondences. ...

    pdf5p bunrieu_1 18-04-2013 30 2   Download

  • Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use of any lexicon and hand-crafted linguistic resource. The statistical data required by the algorithm, that is, mutual information and the difference of t-score between characters, is derived automatically from raw Chinese corpora. The preliminary experiment shows that the segmentation accuracy of our algorithm is acceptable.

    pdf7p bunrieu_1 18-04-2013 19 2   Download

  • This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting m e t h o d considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly.

    pdf8p bunthai_1 06-05-2013 16 2   Download

  • Dictionaries are now commonly used resources in NLP systems. However, different lexical resources are not uniform; they contain different types of information and do not assign words the same number of senses. One way in which this problem might be tackled is by producing mappings between the senses of different resources, the "dictionary mapping problem". However, this is a non-trivial problem, as examination of existing lexical resources demonstrates.

    pdf2p bunthai_1 06-05-2013 26 2   Download

  • A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. ...

    pdf8p bunthai_1 06-05-2013 19 2   Download

  • Identifying whether a multi-word expression (MWE) is compositional or not is important for numerous NLP applications. Sense induction can partition the context of MWEs into semantic uses and therefore aid in deciding compositionality. We propose an unsupervised system to explore this hypothesis on compound nominals, proper names and adjective-noun constructions, and evaluate the contribution of sense induction. The evaluation set is derived from WordNet in a semisupervised way. Graph connectivity measures are employed for unsupervised parameter tuning. ...

    pdf4p hongphan_1 15-04-2013 24 1   Download

  • We present a general framework to incorporate prior knowledge such as heuristics or linguistic features in statistical generative word alignment models. Prior knowledge plays a role of probabilistic soft constraints between bilingual word pairs that shall be used to guide word alignment model training. We investigate knowledge that can be derived automatically from entropy principle and bilingual latent semantic analysis and show how they can be applied to improve translation performance.

    pdf8p hongvang_1 16-04-2013 13 1   Download

  • A distributional method for part-of-speech induction is presented which, in contrast to most previous work, determines the part-of-speech distribution of syntactically ambiguous words without explicitly tagging the underlying text corpus. This is achieved by assuming that the word pair consisting of the left and right neighbor of a particular token is characteristic of the part of speech at this position, and by clustering the neighbor pairs on the basis of their middle words as observed in a large corpus.

    pdf4p hongvang_1 16-04-2013 13 1   Download

  • This paper proposes the application of finite-state approximation techniques on a unification-based grammar of word formation for a language like German. A refinement of an RTN-based approximation algorithm is proposed, which extends the state space of the automaton by selectively adding distinctions based on the parsing history at the point of entering a context-free rule. The selection of history items exploits the specific linguistic nature of word formation.

    pdf8p bunbo_1 17-04-2013 23 1   Download

  • An investment of effort over the last two years has begun to produce a wealth of data concerning computational psycholinguistic models of syntax acquisition. The data is generated by running simulations on a recently completed database of word order patterns from over 3,000 abstract languages. This article presents the design of the database which contains sentence patterns, grammars and derivations that can be used to test acquisition models from widely divergent paradigms.

    pdf8p bunbo_1 17-04-2013 26 1   Download

  • We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership.

    pdf8p bunmoc_1 20-04-2013 16 1   Download

  • This paper defines multiset-valued linear index grammar and unordered vector grammar with dominance links. The former models certain uses of multisetvalued feature structures in unification-based formalisms, while the latter is motivated by word order variation and by "quasi-trees", a generalization of trees. The two formalisms are weakly equivalent, and an important subset is at most context-sensitive and polynomially parsable.

    pdf8p bunmoc_1 20-04-2013 27 1   Download

  • W e describe a methodology and associated software system for the construction of a large lexicon from an existing machine-readable (published) dictionary. The lexicon serves as a component of an English morphological and syntactic analyesr and contains entries with grammatical definitionscompatible with the word and sentence grammar employed by the analyser. W e describe a software system with two integrated components. One of these is capable of extracting syntactically rich, theory-neutral lexical templates from a suitable machine-readabh source. ...

    pdf8p bungio_1 03-05-2013 21 1   Download


Đồng bộ tài khoản