Natural language annotation

Xem 1-20 trên 97 kết quả Natural language annotation
  • Create your own natural language training corpus for machine learning. Whether you’re working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycle—the process of adding metadata to your training corpus to help ML algorithms work more efficiently. You don’t need any programming or linguistics experience to get started.

    pdf97p hoa_can 26-01-2013 21 5   Download

  • BOOK DESCRIPTION This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to automatic summarization and translation. With Natural Language Processing with Python, you’ll learn how to write Python programs to work with large collections of unstructured text. You’ll access richly-annotated datasets using a comprehensive range of linguistic data structures.

    pdf504p hoa_can 26-01-2013 33 12   Download

  • This demonstration presents the Annotation Librarian, an application programming interface that supports rapid development of natural language processing (NLP) projects built in Apache Unstructured Information Management Architecture (UIMA). The flexibility of UIMA to support all types of unstructured data – images, audio, and text – increases the complexity of some of the most common NLP development tasks.

    pdf6p hongdo_1 12-04-2013 20 3   Download

  • We demonstrate an open-source natural language generation engine that produces descriptions of entities and classes in English and Greek from OWL ontologies that have been annotated with linguistic and user modeling information expressed in RDF . We also demonstrate an accompanying plug-in for the Prot´ g´ ontology editor, e e which can be used to create the ontology’s annotations and generate previews of the resulting texts by invoking the generation engine.

    pdf4p bunthai_1 06-05-2013 15 2   Download

  • In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language target sentence. We mainly focus on the quality of the different alignment and ME models and show that the direct ME approach outperforms the alignment templates method. ...

    pdf8p bunthai_1 06-05-2013 22 1   Download

  • We demonstrate a system for flexible querying against text that has been annotated with the results of NLP processing. The system supports self-overlapping and parallel layers, integration of syntactic and ontological hierarchies, flexibility in the format of returned results, and tight integration with SQL. We present a query language and its use on examples taken from the NLP literature.

    pdf4p bunbo_1 17-04-2013 14 2   Download

  • It is necessary to have a (large) annotated corpus to build a statistical parser. Acquisition of such a corpus is costly and time-consuming. This paper presents a method to reduce this demand using active learning, which selects what samples to annotate, instead of annotating blindly the whole training corpus. Sample selection for annotation is based upon “representativeness” and “usefulness”. A model-based distance is proposed to measure the difference of two sentences and their most likely parse trees.

    pdf8p bunmoc_1 20-04-2013 12 2   Download

  • We introduce the brat rapid annotation tool (BRAT), an intuitive web-based tool for text annotation supported by Natural Language Processing (NLP) technology. BRAT has been developed for rich structured annotation for a variety of NLP tasks and aims to support manual curation efforts and increase annotator productivity using NLP techniques.

    pdf6p bunthai_1 06-05-2013 23 2   Download

  • We present experiments with part-ofspeech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.

    pdf11p bunthai_1 06-05-2013 18 3   Download

  • In many natural language applications, there is a need to enrich syntactical parse trees. We present a statistical tree annotator augmenting nodes with additional information. The annotator is generic and can be applied to a variety of applications. We report 3 such applications in this paper: predicting function tags; predicting null elements; and predicting whether a tree constituent is projectable in machine translation. Our function tag prediction system outperforms significantly published results. ...

    pdf9p hongdo_1 12-04-2013 16 2   Download

  • The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator.

    pdf10p bunthai_1 06-05-2013 12 2   Download

  • This paper describes an automatic method for creating a domain-independent senseannotated corpus harvested from the web. As a proof of concept, this method has been applied to German, a language for which sense-annotated corpora are still in short supply. The sense inventory is taken from the German wordnet GermaNet. The web-harvesting relies on an existing mapping of GermaNet to the German version of the web-based dictionary Wiktionary.

    pdf10p bunthai_1 06-05-2013 18 2   Download

  • Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more specific instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for converting flat sets of instance-level annotations into hierarchically organized, concept-level annotations, which capture not only the broad semantics of the desired arguments (e.g., ‘People’ rather than ‘Locations’), but also the correct level of generality (e.g.

    pdf11p bunthai_1 06-05-2013 14 2   Download

  • Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures (derivations in Combinatory Categorial Grammar) and semantic representations (Discourse Representation Structures), including events, thematic roles, named entities, anaphora, scope, and rhetorical structure. We have created a wiki-like Web-based platform on which a crowd of expert annotators (i.e.

    pdf5p bunthai_1 06-05-2013 15 2   Download

  • In order to build robust automatic abstracting systems, there is a need for better training resources than are currently available. In this paper, we introduce an annotation scheme for scientific articles which can be used to build such a resource in a consistent way. The seven categories of the scheme are based on rhetorical moves of argumentation. Our experimental results show that the scheme is stable, reproducible and intuitive to use.

    pdf8p bunthai_1 06-05-2013 19 2   Download

  • We think the parts are of interest in their o~. The paper consists of three sections: (I) We give a detailed description of the PROLOG implementation of the parser which is based on the theory of lexical functional grammar (I/V.). The parser covers the fragment described in [1,94]. I.e., it is able to analyse constructions involving functional control and long distance dependencies.

    pdf6p buncha_1 08-05-2013 20 2   Download

  • Our poster presents results and experiences from the application of the system to 300,000 word forms, a subpart of a larger corpus. The application of the system is carried out in two steps, an automatic lexical look up followed by homograph separation, which is done partly automatically, partly manually. Lexical and morphological analysis and disambiguation of Swedish is a rather complicated task, a fact which should hold for several other languages as well. Below a sample text is given, showing both the amount of information that has to be specified for each word form and the degree of...

    pdf1p buncha_1 08-05-2013 20 2   Download

  • This paper defines a language Z~ for specifying LFG grammars. This enables constraints on LFG's composite ontology (c-structures synchronised with fstructures) to be stated directly; no appeal to the LFG construction algorithm is needed. We use f to specify schemata annotated rules and the LFG uniqueness, completeness and coherence principles. Broader issues raised by this work are noted and discussed.

    pdf6p buncha_1 08-05-2013 15 2   Download

  • Dialogue systems are one of the most challenging applications of Natural Language Processing. In recent years, some statistical dialogue models have been proposed to cope with the dialogue problem. The evaluation of these models is usually performed by using them as annotation models. Many of the works on annotation use information such as the complete sequence of dialogue turns or the correct segmentation of the dialogue. This information is not usually available for dialogue systems.

    pdf8p hongvang_1 16-04-2013 11 1   Download

  • Spoken Language Understanding (SLU) addresses the problem of extracting semantic meaning conveyed in an utterance. The traditional knowledge-based approach to this problem is very expensive -- it requires joint expertise in natural language processing and speech recognition, and best practices in language engineering for every new domain. On the other hand, a statistical learning approach needs a large amount of annotated data for model training, which is seldom available in practical applications outside of large research labs. ...

    pdf8p hongvang_1 16-04-2013 10 1   Download

Đồng bộ tài khoản