Xem 1-20 trên 210 kết quả Arab
  • We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser.

    pdf6p hongdo_1 12-04-2013 15 3   Download

  • Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system.

    pdf5p hongdo_1 12-04-2013 15 3   Download

  • Arabic handwriting recognition (HR) is a challenging problem due to Arabic’s connected letter forms, consonantal diacritics and rich morphology. In this paper we isolate the task of identification of erroneous words in HR from the task of producing corrections for these words. We consider a variety of linguistic (morphological and syntactic) and non-linguistic features to automatically identify these errors. Our best approach achieves a roughly ∼15% absolute increase in F-score over a simple but reasonable baseline. ...

    pdf10p hongdo_1 12-04-2013 19 3   Download

  • We explore the contribution of morphological features – both lexical and inflectional – to dependency parsing of Arabic, a morphologically rich language. Using controlled experiments, we find that definiteness, person, number, gender, and the undiacritzed lemma are most helpful for parsing on automatically tagged input.

    pdf11p hongdo_1 12-04-2013 13 3   Download

  • Although Subjectivity and Sentiment Analysis (SSA) has been witnessing a flurry of novel research, there are few attempts to build SSA systems for Morphologically-Rich Languages (MRL). In the current study, we report efforts to partially fill this gap. We present a newly developed manually annotated corpus of Modern Standard Arabic (MSA) together with a new polarity lexicon.

    pdf5p hongdo_1 12-04-2013 13 3   Download

  • The following pages contain nothing new and nothing original, but they do contain a good deal of information gathered from various sources, and brought together under one cover. The book itself may be useful, not, perhaps, to the Professor or to the Orientalist, but to the general reader, and to the student commencing the study of Arabic. To the latter it will give some idea of the vast field of Arabian literature that lies before him, and prepare him, perhaps, for working out a really interesting work upon the subject.

    pdf110p nhokheo2 15-04-2013 22 3   Download

  • We present a method for generating Colloquial Egyptian Arabic (CEA) from morphologically disambiguated Modern Standard Arabic (MSA). When used in POS tagging, this process improves the accuracy from 73.24% to 86.84% on unseen CEA text, and reduces the percentage of out-ofvocabulary words from 28.98% to 16.66%.

    pdf5p nghetay_1 07-04-2013 13 2   Download

  • If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system.

    pdf6p nghetay_1 07-04-2013 17 2   Download

  • The written form of Arabic, Modern Standard Arabic (MSA), differs quite a bit from the spoken dialects of Arabic, which are the true “native” languages of Arabic speakers used in daily life. However, due to MSA’s prevalence in written form, almost all Arabic datasets have predominantly MSA content. We present the Arabic Online Commentary Dataset, a 52M-word monolingual dataset rich in dialectal content, and we describe our long-term annotation effort to identify the dialect level (and dialect itself) in each sentence of the dataset. ...

    pdf5p hongdo_1 12-04-2013 22 2   Download

  • We present an enriched version of the Penn Arabic Treebank (Maamouri et al., 2004), where latent features necessary for modeling morpho-syntactic agreement in Arabic are manually annotated. We describe our process for efficient annotation, and present the first quantitative analysis of Arabic morphosyntactic phenomena.

    pdf6p hongdo_1 12-04-2013 23 2   Download

  • In morphologically rich languages such as Arabic, the abundance of word forms resulting from increased morpheme combinations is significantly greater than for languages with fewer inflected forms (Kirchhoff et al., 2006). This exacerbates the out-of-vocabulary (OOV) problem. Test set words are more likely to be unknown, limiting the effectiveness of the model. The goal of this study is to use the regularities of Arabic inflectional morphology to reduce the OOV problem in that language.

    pdf6p hongphan_1 15-04-2013 15 2   Download

  • We present M AGEAD, a morphological analyzer and generator for the Arabic language family. Our work is novel in that it explicitly addresses the need for processing the morphology of the dialects. M AGEAD performs an on-line analysis to or generation from a root+pattern+features representation, it has separate phonological and orthographic representations, and it allows for combining morphemes from different dialects. We present a detailed evaluation of M AGEAD.

    pdf8p hongvang_1 16-04-2013 25 2   Download

  • We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity types in addition to standard categories. Standard supervised learning on newswire text leads to poor target-domain recall.

    pdf12p bunthai_1 06-05-2013 19 2   Download

  • Arabic morphology is complex, partly because of its richness, and partly because of common irregular word forms, such as broken plurals (which resemble singular nouns), and nouns with irregular gender (feminine nouns that look masculine and vice versa). In addition, Arabic morphosyntactic agreement interacts with the lexical semantic feature of rationality, which has no morphological realization. In this paper, we present a series of experiments on the automatic prediction of the latent linguistic features of functional gender and number, and rationality in Arabic.

    pdf11p bunthai_1 06-05-2013 17 2   Download

  • Những hình ảnh đẹp của Các tiểu vương quốc Arab thống nhất hiện ra qua ống kính của một khách du lịch người Việt. Trong chuyến thăm ngắn ngủi Các tiểu vương quốc Arab thống nhất, chúng tôi có dịp đến Dubai, thành phố hiện đại với những tòa nhà chọc trời, nhìn từ biển. .Đường phố ở đây rộng 7 đến 8 làn đường, lại khiến chúng tôi thêm ấn tượng vì sạch bong, phẳng lì.

    pdf8p mountain123123 02-06-2013 18 2   Download

  • In the preparation of an Arabic to English sentence-for-sentence mechanical translation program, a computer has been applied to the testing of statements concerning various phases of the morphological and syntactic structure of Arabic and structural equivalences between Arabic and English.

    pdf9p nghetay_1 06-04-2013 14 1   Download

  • Due to Arabic’s morphological complexity, Arabic retrieval benefits greatly from morphological analysis – particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links.

    pdf5p nghetay_1 07-04-2013 7 1   Download

  • “Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing specific guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains.

    pdf6p nghetay_1 07-04-2013 19 1   Download

  • We investigate the tasks of general morphological tagging, diacritization, and lemmatization for Arabic. We show that for all tasks we consider, both modeling the lexeme explicitly, and retuning the weights of individual classifiers for the specific task, improve the performance.

    pdf4p hongphan_1 15-04-2013 13 1   Download

  • There is a widely held belief in the natural language and computational linguistics communities that Semantic Role Labeling (SRL) is a significant step toward improving important applications, e.g. question answering and information extraction. In this paper, we present an SRL system for Modern Standard Arabic that exploits many aspects of the rich morphological features of the language. The experiments on the pilot Arabic Propbank data show that our system based on Support Vector Machines and Kernel Methods yields a global SRL F1 score of 82.

    pdf9p hongphan_1 15-04-2013 21 1   Download

Đồng bộ tài khoản