Agglutinative languages

Xem 1-13 trên 13 kết quả Agglutinative languages
  • The previous probabilistic part-of-speech tagging models for agglutinative languages have considered only lexical forms of morphemes, not surface forms of words. This causes an inaccurate calculation of the probability. The proposed model is based on the observation that when there exist words (surface forms) that share the same lexical forms, the probabilities to appear are different from each other. Also, it is designed to consider lexical form of word. By experiments, we show that the proposed model outperforms the bigram Hidden Markov model (HMM)-based tagging model.

    pdf4p bunbo_1 17-04-2013 23 2   Download

  • In the environment of word-processors thesauri serve the user's convenience in choosing the best suitable synonym of a word. Words in text of agglutinative languages occur almost always as inflected forms, thus finding them directly in a stem vocabulary is impossible. H01y0ltu, the inflectional thesaurus coping with this problem is introduced in the paper.

    pdf1p buncha_1 08-05-2013 13 1   Download

  • Morphological segmentation has been shown to be beneficial to a range of NLP tasks such as machine translation, speech recognition, speech synthesis and information retrieval. Recently, a number of approaches to unsupervised morphological segmentation have been proposed. This paper describes an algorithm that draws from previous approaches and combines them into a simple model for morphological segmentation that outperforms other approaches on English and German, and also yields good results on agglutinative languages such as Finnish and Turkish. ...

    pdf8p hongvang_1 16-04-2013 17 2   Download

  • En aquest article presentem els resultats de la combinaci6 de m~todes estoc/lstics i basats en regles aplicats a la desambiguaci6 morfosinthcfica de l'euskara. Els m6todes utilitzats per a la desambiguaci6 s6n: les Gramhtiques de Restrictions (CG) i l'etiquetador basat en HMM del projecte MULTEXT. E1 carhcter aglutinant de l'euskara fa necessari la utilitzaci6 d'un analitzador morfolbgic per assignar a cada paraula totes les seves interpretacions. Les regles de CG s'apliquen utilitzant la informaci6 morfol6gica completa i aquest proc6s redueix parcialment rambigtiitat dels textos. ...

    pdf1p bunrieu_1 18-04-2013 19 2   Download

  • This paper introduces a new approach to morpho-syntactic analysis through Humor 99 (High-speed Unification Mo.rphology), a reversible and unification-based morphological analyzer which has already been integrated with a variety o f industrial applications. Humor 99 successfully copes with problems o f agglutinative (e.g. Hungarian, Turkish, Estonian) and other (highly) inflectional languages (e.g. Polish, Czech, German) very effectively.

    pdf8p bunrieu_1 18-04-2013 21 2   Download

  • This poster paper describes a full scale two-level morphological description (Karttunen, 1983, Koskenniemi, 1983) of Turkish word structures. The description has been implemented using the PCKIMMO environment (Antworth, 1990) and is based on a root word lexicon of about 23,000 roots words. Almost all the special cases of and exceptions to phonological and morphological rules have been implemented. Turkish is an agglutinative language with word structures formed by productive affixations of derivational and inflectional suffixes to root words. ...

    pdf1p buncha_1 08-05-2013 28 1   Download

  • This book focuses primarily on speech recognition and the related tasks such as speech enhancement and modeling. This book comprises 3 sections and thirteen chapters written by eminent researchers from USA, Brazil, Australia, Saudi Arabia, Japan, Ireland, Taiwan, Mexico, Slovakia and India. Section 1 on speech recognition consists of seven chapters. Sections 2 and 3 on speech enhancement and speech modeling have three chapters each respectively to supplement section 1.

    pdf338p camchuong_1 04-12-2012 27 3   Download

  • Turkish is an agglutinative language with complex morphological structures, therefore using only word forms is not enough for many computational tasks. In this paper we analyze the effect of morphology in a Named Entity Recognition system for Turkish. We start with the standard word-level representation and incrementally explore the effect of capturing syntactic and contextual properties of tokens. Furthermore, we also explore a new representation in which roots and morphological features are represented as separate tokens instead of representing only words as tokens. ...

    pdf6p hongdo_1 12-04-2013 25 3   Download

  • This paper presents results from the first statistical dependency parser for Turkish. Turkish is a free-constituent order language with complex agglutinative inflectional and derivational morphology and presents interesting challenges for statistical parsing, as in general, dependency relations are between “portions” of words – called inflectional groups. We have explored statistical models that use different representational units for parsing.

    pdf8p bunthai_1 06-05-2013 31 2   Download

  • Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics.

    pdf4p hongphan_1 15-04-2013 21 2   Download

  • Morphological disambiguation is the process of assigning one set of morphological features to each individual word in a text. When the word is ambiguous (there are several possible analyses for the word), a disambiguation procedure based on the word context must be applied. This paper deals with morphological disambiguation of the Hebrew language, which combines morphemes into a word in both agglutinative and fusional ways.

    pdf8p hongvang_1 16-04-2013 15 1   Download

  • This paper presents the results of automatically inducing a Combinatory Categorial Grammar (CCG) lexicon from a Turkish dependency treebank. The fact that Turkish is an agglutinating free wordorder language presents a challenge for language theories. We explored possible ways to obtain a compact lexicon, consistent with CCG principles, from a treebank which is an order of magnitude smaller than Penn WSJ.

    pdf6p bunbo_1 17-04-2013 15 1   Download

  • We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data.

    pdf8p bunbo_1 17-04-2013 17 1   Download


Đồng bộ tài khoản