Parsing and text processing

Xem 1-20 trên 38 kết quả Parsing and text processing
  • We have analyzed definitions from Webster's Seventh New Collegiate Dictionary using Sager's Linguistic String Parser and again using basic UNIX text processing utilities such as grep and awk. Tiffs paper evaluates both procedures, compares their results, and discusses possible future lines of research exploiting and combining their respective strengths. Introduction As natural language systems grow more sophisticated, they need larger and more d ~ l e d lexicons.

    pdf8p bungio_1 03-05-2013 38 1   Download

  • We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition.

    pdf8p bunmoc_1 20-04-2013 27 2   Download

  • This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the “all subtrees” (DOP) representation described by (Bod 1998), or a representation tracking all sub-fragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and namedentity extraction from web data. ...

    pdf8p bunmoc_1 20-04-2013 29 1   Download

  • Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many other NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the techniques of two widely-used toolkits, ClearNLP and Stanford POS Tagger, and develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger.

    pdf15p truongtien_09 10-04-2018 21 2   Download

  • MACAON is a tool suite for standard NLP tasks developed for French. MACAON has been designed to process both human-produced text and highly ambiguous word-lattices produced by NLP tools. MACAON is made of several native modules for common tasks such as a tokenization, a part-of-speech tagging or syntactic parsing, all communicating with each other through XML files . In addition, exchange protocols with external tools are easily definable. MACAON is a fast, modular and open tool, distributed under GNU Public License. ...

    pdf6p hongdo_1 12-04-2013 29 3   Download

  • Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to textprocessing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. ...

    pdf8p bunmoc_1 20-04-2013 43 3   Download

  • While various aspects of syntactic structure have been shown to bear on the determination of phraselevel prosody, the text-to-speech field has lacked a robust working system to test the possible relations between syntax and prosody. We describe an implemented system which uses the deterministic parser Fidditch to create the input for a set of prosody rules.

    pdf11p bungio_1 03-05-2013 36 3   Download

  • This paper proposes an approach to full parsing suitable for Information Extraction from texts. Sequences of cascades of rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then argumental relations are recognized; finally modifier attachment is performed and the global parse tree is built. The approach was proven to work for three languages and different domains. It was implemented in the IE module of FACILE, a EU project for multilingual text classification and !E. ...

    pdf8p bunthai_1 06-05-2013 41 3   Download

  • The integration of sophisticated inference-based techniques into natural language processing applications first requires a reliable method of encoding the predicate-argument structure of the propositional content of text. Recent statistical approaches to automated predicateargument annotation have utilized parse tree paths as predictive features, which encode the path between a verb predicate and a node in the parse tree that governs its argument.

    pdf8p hongvang_1 16-04-2013 30 2   Download

  • A description will be given of a procedure to asslgn the most likely probabilitles to each of the rules of a given context-free grammar. The grammar developed by S. Kuno at Harvard University was picked as the basis and was successfully augmented with rule probabilities. A brief exposition of the method with some preliminary results, w h e n u s e d as a device for disamblguatingparsing English texts picked from natural corpus, will be given.

    pdf4p bungio_1 03-05-2013 24 2   Download

  • The paper describes the development of software for automatic grammatical ana]ysi$ of u n l ~ ' U i ~ , unedited English text at the Unit for Compm= Research on the Ev~li~h Language (UCREL) at the U n i v e t ~ of Lancaster. The work is ~n'nmtly funded by IBM and carried out in collaboration with colleagues at IBM UK ( W ' ~ ) and IBM Yorktown Heights. The paper will focus on the lexicon component of the word raging system, the UCREL grammar, the datal~zlks of parsed sentences, and the tools that have been...

    pdf6p bungio_1 03-05-2013 33 2   Download

  • We report work1 in progress on adding affect-detection to an existing program for virtual dramatic improvisation, monitored by a human director. To partially automate the directors’ functions, we have partially implemented the detection of emotions, etc. in users’ text input, by means of pattern-matching, robust parsing and some semantic analysis. The work also involves basic research into how affect is conveyed by metaphor.

    pdf4p bunthai_1 06-05-2013 24 2   Download

  • We investigate the possibility of exploiting character-based dependency for Chinese information processing. As Chinese text is made up of character sequences rather than word sequences, word in Chinese is not so natural a concept as in English, nor is word easy to be defined without argument for such a language. Therefore we propose a character-level dependency scheme to represent primary linguistic relationships within a Chinese sentence. The usefulness of character dependencies are verified through two specialized dependency parsing tasks.

    pdf9p bunthai_1 06-05-2013 30 2   Download

  • Dividing sentences in chunks of words is a useful preprocessing step for parsing, information extraction and information retrieval. (l~mshaw and Marcus, 1995) have introduced a "convenient" data representation for chunking by converting it to a tagging task. In this paper we will examine seven different data representations for the problem of recognizing noun phrase chunks. We will show that the the data representation choice has a minor influence on chunking performance.

    pdf7p bunthai_1 06-05-2013 31 2   Download

  • In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96). is derived from the analysis given in Generalized Phrase Structure Grammar (Gazdar et al. 95).

    pdf8p bunthai_1 06-05-2013 50 2   Download

  • We derive the rhetorical structures of texts by means of two new, surface-form-based algorithms: one that identifies discourse usages of cue phrases and breaks sentences into clauses, and one that produces valid rhetorical structure trees for unrestricted natural language texts. The algorithms use information that was derived from a corpus analysis of cue phrases.

    pdf8p bunthai_1 06-05-2013 35 2   Download

  • The Constituent Likelihood Automatic Word-tagging System (CLAWS) was originally designed for the low-level grammatical analysis of the million-word LOB Corpus of English text samples. CLAWS does not attempt a full parse, but uses a firat-order Markov model of language to assign word-class labels to words. CLAWS can be modified to detect grammatical errors, essentially by flagging unlikely word-class transitions in the input text.

    pdf8p buncha_1 08-05-2013 29 2   Download

  • Acquiring information systems specifications from natural language description is presented as a problem class that requires a different treatment of semantics when compared with other applied NL systems such as database and operating system interfaces. Within this problem class, the specific task of obtaining explicit conceptual data models from natural language text or dialogue is being investigated. The knowledge brought to bear on this task is classified into syntactic, semantic and systems analysis knowledge.

    pdf8p buncha_1 08-05-2013 42 2   Download

  • This paper 1 presents a rapid and robust parsing system currently used to learn from large bodies of unedited text. The system contains a multivalued part-of-speech disambiguator and a novel parser employing bottom-up recognition to find the constituent phrases of larger structures that might be too difficult to analyze. The results of applying the disambiguator and parser to large sections of the Lancaster/ Oslo-Bergen corpus are presented. INTRODUCTION We have implemented and tested a parsing system which is rapid and robust enough to apply to large bodies of unedited text. ...

    pdf9p bungio_1 03-05-2013 30 1   Download

  • This paper describes a new hardware algorithm for morpheme extraction and its implementation on a specific machine (MEX-I), as the first step toward achieving natural language parsing accelerators. It also shows the machine's performance, 100-1,000 times faster than a personal computer. This machine can extract morphemes from 10,000 character Japanese text by searching an 80,000 morpheme dictionary in I second. It can treat multiple text streams, which are composed of character candidates, as well as one text stream.

    pdf8p bungio_1 03-05-2013 33 1   Download



p_strKeyword=Parsing and text processing

nocache searchPhinxDoc


Đồng bộ tài khoản