Syntactic information acquisition

  • The question we have addressed here is to define the size and composition of the corpus we would need in order to get necessary and sufficient information for Machine Learning techniques to induce that type of information. Representativeness of a corpus is a topic largely dealt with, especially in corpus linguistics. One of the standard references is Biber (1993) where the author offers guidelines for corpus design to characterize a language.

  • To facilitate the use of syntactic information in the study of child language acquisition, a coding scheme for Grammatical Relations (GRs) in transcripts of parent-child dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We discuss the use of current NLP techniques to produce the GRs in this annotation scheme. By using a statistical parser (Charniak, 2000) and memorybased learning tools for classification (Daelemans et al., 2004), we obtain high precision and recall of several GRs. ...

  • Acquiring information systems specifications from natural language description is presented as a problem class that requires a different treatment of semantics when compared with other applied NL systems such as database and operating system interfaces. Within this problem class, the specific task of obtaining explicit conceptual data models from natural language text or dialogue is being investigated. The knowledge brought to bear on this task is classified into syntactic, semantic and systems analysis knowledge.

  • Paraphrases have proved to be useful in many applications, including Machine Translation, Question Answering, Summarization, and Information Retrieval. Paraphrase acquisition methods that use a single monolingual corpus often produce only syntactic paraphrases. We present a method for obtaining surface paraphrases, using a 150GB (25 billion words) monolingual corpus. Our method achieves an accuracy of around 70% on the paraphrase acquisition task. We further show that we can use these paraphrases to generate surface patterns for relation extraction.

  • We discuss ways of allowing the users of a natural language processor to define, examine, and modify the definitions of any domain-specific words or phrases known to the system. An implementation of this work forms a critical portion of the knowledge acquisition component of our Transportable English-Language Interface (TELl), which answers English questions about tabular (first normal-form) data files and runs on a Symbolics Lisp Machine.

  • The lexicons for Knowledge-Based Machine Translation systems require knowledge intensive morphological, syntactic and semantic information. This information is often used in different ways and usually formatted for a specific NLP system. This tends to make both the acquisition and maintenance of lexical databases cumbersome, inefficient and error-prone. In order to solve these problems, we have developed a program called COOL which automates the acquisition and maintenance processes and allows us to standardize and centralize the databases. ...

