Linguistic morphology

Xem 1-20 trên 33 kết quả Linguistic morphology
  • Each textbook provides a specific perspective on the discipline that it aims to introduce. Therefore, writing this book has not only been a challenge for me because of the didactic demands that each textbook imposes on its writer. It also forced me to rethink my own ideas on morphology in confrontation with those of others, and to come up with a consistent picture of what morphology is about. This perspective is summarized by the title of this book, The Grammar of Words, which gives the linguistic entity of the word a pivotal role in understanding morphology....

    pdf323p ltkhoi 13-08-2009 2176 1305   Download

  • Arabic handwriting recognition (HR) is a challenging problem due to Arabic’s connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. Our best approach achieves a roughly ∼15% absolute increase in F-score over a simple but reasonable baseline. ...

    pdf10p hongdo_1 12-04-2013 26 3   Download

  • We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the likelihood of the possible stems of the openclass words. By encoding some basic linguistic information, the machine learning task is simplified, while achieving stateof-the-art tokenization results and competitive POS results, although with a reduced tag set and some evaluation difficulties.

    pdf6p hongdo_1 12-04-2013 24 3   Download

  • Morphological processes in Semitic languages deliver space-delimited words which introduce multiple, distinct, syntactic units into the structure of the input sentence. These words are in turn highly ambiguous, breaking the assumption underlying most parsers that the yield of a tree for a given sentence is known in advance. Here we propose a single joint model for performing both morphological segmentation and syntactic disambiguation which bypasses the associated circularity.

    pdf9p hongphan_1 15-04-2013 21 3   Download

  • We present experiments with part-ofspeech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.

    pdf11p bunthai_1 06-05-2013 26 3   Download

  • A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. ...

    pdf8p bunthai_1 06-05-2013 19 2   Download

  • Concerning different approaches to automatic PoS tagging: EngCG-2, a constraintbased morphological tagger, is compared in a double-blind test with a state-of-the-art statistical tagger on a common disambiguation task using a common tag set. The experiments show that for the same amount of remaining ambiguity, the error rate of the statistical tagger is one order of magnitude greater than that of the rule-based one. The two related issues of priming effects compromising the results and disagreement between human annotators are also addressed. ...

    pdf8p bunthai_1 06-05-2013 24 2   Download

  • Our poster presents results and experiences from the application of the system to 300,000 word forms, a subpart of a larger corpus. The application of the system is carried out in two steps, an automatic lexical look up followed by homograph separation, which is done partly automatically, partly manually. Lexical and morphological analysis and disambiguation of Swedish is a rather complicated task, a fact which should hold for several other languages as well. Below a sample text is given, showing both the amount of information that has to be specified for each word form and the degree of...

    pdf1p buncha_1 08-05-2013 23 2   Download

  • Due to Arabic’s morphological complexity, Arabic retrieval benefits greatly from morphological analysis – particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links.

    pdf5p nghetay_1 07-04-2013 9 1   Download

  • We address the problem of translating from morphologically poor to morphologically rich languages by adding per-word linguistic information to the source language. We use the syntax of the source sentence to extract information for noun cases and verb persons and annotate the corresponding words accordingly. In experiments, we show improved performance for translating from English into Greek and Czech. For English–Greek, we reduce the error on the verb conjugation from 19% to 5.4% and noun case agreement from 9% to 6%. ...

    pdf8p hongphan_1 15-04-2013 17 1   Download

  • This paper proposes the application of finite-state approximation techniques on a unification-based grammar of word formation for a language like German. A refinement of an RTN-based approximation algorithm is proposed, which extends the state space of the automaton by selectively adding distinctions based on the parsing history at the point of entering a context-free rule. The selection of history items exploits the specific linguistic nature of word formation.

    pdf8p bunbo_1 17-04-2013 23 1   Download

  • We describe a computational framework for a grammar architecture in which different linguistic domains such as morphology, syntax, and semantics are treated not as separate components but compositional domains. The framework is based on Combinatory Categorial Grammars and it uses the morpheme as the basic building block of the categorial lexicon.

    pdf3p bunmoc_1 20-04-2013 17 1   Download

  • In this paper, we present a morphological processor for Modern Greek. From the linguistic point of view, we tr5, to elucidate the complexity of the inflectional system using a lexical model which follows the mecent work by Lieber, 1980, Selkirk 1982, Kiparsky 1982, and others. The implementation is based on the concept of "validation grammars" (Coumtin 1977). The morphological processing is controlled by a finite automaton and it combines a. a dictionary containing the stems for a representative fragment of Modern Greek and all the inflectional affixes with b.

    pdf6p buncha_1 08-05-2013 13 1   Download

  • We present a notation for the declarative statement of morphological relationships and lexieal rules, based on the traditional notion of Word and Paradigm (cf Hockett 1954). The phenomenon of blocking arises from a generalized version of Kiparsky's (1973) Elsewhere Condition, stated in terms of ordering by subsumption over paradigms. Orthographic constraints on morphemic alternation are described by means of string equations (Siekmann 1975).

    pdf8p buncha_1 08-05-2013 21 1   Download

  • The correction method distinguishes between orthographic errors and typographical errors. • Typographical errors (or misstypings) are uncognitive errors which do not follow linguistic criteria. • Orthographic errors are cognitive errors which occur when the writer does not know or has forgotten the correct spelling for a word. They are more persistent because of their cognitive nature, they leave worse impression and, finally, its treatment is an interesting application for language standardization purposes. ...

    pdf1p buncha_1 08-05-2013 28 1   Download

  • 1. The purpose of this paper is the establishment of classes of verbals according to the morphemic alternations of base-form finals; 2. Verbals which are subject to morphemic alternation are treated as single entries instead of as multiple entries;

    pdf12p nghetay_1 06-04-2013 22 2   Download

  • We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics.

    pdf5p hongdo_1 12-04-2013 21 3   Download

  • Hungarian is a stereotype of morphologically rich and non-configurational languages. Here, we introduce results on dependency parsing of Hungarian that employ a 80K, multi-domain, fully manually annotated corpus, the Szeged Dependency Treebank. We show that the results achieved by state-of-the-art data-driven parsers on Hungarian and English (which is at the other end of the configurational-nonconfigurational spectrum) are quite similar to each other in terms of attachment scores.

    pdf11p bunthai_1 06-05-2013 20 3   Download

  • We investigate the controversial issue about the upper bound of interjudge agreement in the use of a low-level grammatical representation. Pessimistic views suggest that several percent of words in running text are undecidable in terms of part-of-speech categories. Our experiments with 55kW data give reason for optimism: linguists with only 30 hours' training apply the EngCG-2 morphological tags with almost 100% interjudge agreement.

    pdf5p bunthai_1 06-05-2013 17 3   Download

  • Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics.

    pdf4p hongphan_1 15-04-2013 20 2   Download


Đồng bộ tài khoản