Morphologically complex languages

Xem 1-20 trên 20 kết quả Morphologically complex languages
  • We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level.

    pdf10p hongdo_1 12-04-2013 43 2   Download

  • We present experiments with part-ofspeech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.

    pdf11p bunthai_1 06-05-2013 41 3   Download

  • This paper describes our current research on the properties of derivational affixation in English. Our research arises from a more general research project, the Lexical Systems project at the IBM Thomas J. Watson Research laboratories, the goal for which is to build a variety of computerized dictionary systems for use both by people and by computer programs. An important sub-goal is to build reliable and robust word recognition mechanisms for these dictionaries.

    pdf8p bungio_1 03-05-2013 29 1   Download

  • Morphologically complex terms composed from Greek or Latin elements are frequent in scientific and technical texts. Word forming units are thus relevant cues for the identification of terms in domainspecific texts. This article describes a method for the automatic extraction of terms relying on the detection of classical prefixes and word-initial combining forms. Word-forming units are identified using a regular expression. The system then extracts terms by selecting words which either begin or coalesce with these elements. ...

    pdf4p bunthai_1 06-05-2013 33 2   Download

  • We present a novel method for predicting inflected word forms for generating morphologically rich languages in machine translation. We utilize a rich set of syntactic and morphological knowledge sources from both source and target sentences in a probabilistic model, and evaluate their contribution in generating Russian and Arabic sentences. Our results show that the proposed model substantially outperforms the commonly used baseline of a trigram target language model; in particular, the use of morphological and syntactic features leads to large gains in prediction accuracy. ...

    pdf8p hongvang_1 16-04-2013 32 1   Download

  • The quality of the part-of-speech (PoS) annotation in a corpus is crucial for the development of PoS taggers. In this paper, we experiment with three complementary methods for automatically detecting errors in the PoS annotation for the Icelandic Frequency Dictionary corpus. The first two methods are language independent and we argue that the third method can be adapted to other morphologically complex languages. Once possible errors have been detected, we examine each error candidate and hand-correct the corresponding PoS tag if necessary. ...

    pdf9p bunthai_1 06-05-2013 31 1   Download

  • Turkish is an agglutinative language with complex morphological structures, therefore using only word forms is not enough for many computational tasks. In this paper we analyze the effect of morphology in a Named Entity Recognition system for Turkish. We start with the standard word-level representation and incrementally explore the effect of capturing syntactic and contextual properties of tokens. Furthermore, we also explore a new representation in which roots and morphological features are represented as separate tokens instead of representing only words as tokens. ...

    pdf6p hongdo_1 12-04-2013 42 3   Download

  • Morphological lexica are often implemented on top of morphological paradigms, corresponding to different ways of building the full inflection table of a word. Computationally precise lexica may use hundreds of paradigms, and it can be hard for a lexicographer to choose among them. To automate this task, this paper introduces the notion of a smart paradigm. It is a metaparadigm, which inspects the base form and tries to infer which low-level paradigm applies. If the result is uncertain, more forms are given for discrimination.

    pdf9p bunthai_1 06-05-2013 41 3   Download

  • We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. ...

    pdf11p hongdo_1 12-04-2013 42 2   Download

  • This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). Our methods use unsupervised morphology induction. Unlike previous work we focus on morphologically productive phrase pairs – our decoder can combine morphemes across phrase boundaries. Morphemes in the target language may not have a corresponding morpheme or word in the source language.

    pdf11p hongdo_1 12-04-2013 40 2   Download

  • Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts.

    pdf10p hongdo_1 12-04-2013 32 2   Download

  • Arabic language is a morphologically complex language. Affixes and clitics are regularly attached to stems which make direct comparison between words not practical. In this paper we propose a new automatic headline generation technique that utilizes character cross-correlation to extract best headlines and to overcome the Arabic language complex morphology

    pdf5p hongdo_1 12-04-2013 37 2   Download

  • Morphological analysis must take into account the spelling-change processes of a language as well as its possible configurations of stems, affixes, and inflectional markings. The computational difficultyof the task can be clarified by investigating specific models of morphological processing. The use of finite-state machinery in the "twolevel" model by K i m m o Koskenniemi gives it the appearance of computational efficiency, but closer examination shows the model does not guarantee efficient processing.

    pdf7p bungio_1 03-05-2013 39 2   Download

  • Data driven POS tagging has achieved good performance for English, but can still lag behind linguistic rule based taggers for morphologically complex languages, such as Icelandic. We extend a statistical tagger to handle fine grained tagsets and improve over the best Icelandic POS tagger. Additionally, we develop a case tagger for non-local case and gender decisions. An error analysis of our system suggests future directions.

    pdf4p hongphan_1 15-04-2013 23 1   Download

  • This paper describes the problems faced while using Kimmo's two-level model to describe certain Indian languages such as Tamil and Hindi. The two-level model is shown to be descriptively inadequate to address these problems. A simple extension to the basic two-level model is introduced which allows conflicting phonological rules to coexist. The computational complexity of the extension is the same as Kimmo's two-level model.

    pdf3p bunmoc_1 20-04-2013 28 1   Download

  • This paper presents an application of finite state transducers weighted with feature structure descriptions, following Amtrup (2003), to the morphology of the Semitic language Tigrinya. It is shown that feature-structure weights provide an efficient way of handling the templatic morphology that characterizes Semitic verb stems as well as the long-distance dependencies characterizing the complex Tigrinya verb morphotactics. A relatively complete computational implementation of Tigrinya verb morphology is described. ...

    pdf9p bunthai_1 06-05-2013 27 1   Download

  • In this paper, we present a morphological processor for Modern Greek. From the linguistic point of view, we tr5, to elucidate the complexity of the inflectional system using a lexical model which follows the mecent work by Lieber, 1980, Selkirk 1982, Kiparsky 1982, and others. The implementation is based on the concept of "validation grammars" (Coumtin 1977). The morphological processing is controlled by a finite automaton and it combines a. a dictionary containing the stems for a representative fragment of Modern Greek and all the inflectional affixes with b.

    pdf6p buncha_1 08-05-2013 25 1   Download

  • This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words – called inflectional groups. We have explored statistical models that use different representational units for parsing.

    pdf8p bunthai_1 06-05-2013 42 2   Download

  • Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morphosyntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic.

    pdf11p bunthai_1 06-05-2013 44 2   Download

  • Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system.

    pdf5p hongdo_1 12-04-2013 46 4   Download



p_strKeyword=Morphologically complex languages

nocache searchPhinxDoc


Đồng bộ tài khoản