Parsing strings

We show that the problems of parsing and surface realization for grammar formalisms with “contextfree” derivations, coupled with Montague semantics (under a certain restriction) can be reduced in a uniform way to Datalog query evaluation.
8p hongvang_1 16042013 25 3 Download

Treetostring translation is syntaxaware and efﬁcient but sensitive to parsing errors. Foresttostring translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining subtrees within the ﬁrst best parse through binarization.
11p hongdo_1 12042013 26 2 Download

We introduce synchronous tree adjoining grammars (TAG) into treetostring translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns treetostring rules from aligned Treebankstyle trees. As treetostring translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included.
10p hongdo_1 12042013 23 2 Download

To address the parse error issue for treetostring translation, this paper proposes a similaritybased decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on ChineseEnglish translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice.
6p hongdo_1 12042013 24 2 Download

This paper proposes a forestbased tree sequence to string translation model for syntaxbased statistical machine translation, which automatically learns tree sequence to string translation rules from wordaligned sourcesideparsed bilingual texts. The proposed model leverages on the strengths of both tree sequencebased and forestbased translation models.
9p hongphan_1 14042013 23 2 Download

We present a new approach for mapping natural language sentences to their formal meaning representations using stringkernelbased classiﬁers. Our system learns these classiﬁers for every production in the formal language grammar. Meaning representations for novel natural language sentences are obtained by ﬁnding the most probable semantic parse using these string classiﬁers. Our experiments on two realworld data sets show that this approach compares favorably to other existing systems and is particularly robust to noise. ...
8p hongvang_1 16042013 21 2 Download

Valiant showed that Boolean matrix multiplication (BMM) can be used for CFG parsing. We prove a dual result: CFG parsers running in time O([Gl[w[3e) on a grammar G and a string w can be used to multiply m x m Boolean matrices in time O(m3e/3). In the process we also provide a formal definition of parsing motivated by an informal notion due to Lang. Our result establishes one of the first limitations on general CFG parsing: a fast, practical CFG parser would yield a fast, practical BMM algorithm, which is not believed to exist. 1 Introduction The standard method...
7p bunthai_1 06052013 30 2 Download

We study parsing of tree adjoining grammars with particular emphasis on the use of shared forests to represent all the parse trees deriving a wellformed string. We show that there are two distinct ways of representing the parse forest one of which involves the use of linear indexed grammars and the other the use of contextfree grammars. The work presented in this paper is intended to give a general framework for studying tag parsing.
10p buncha_1 08052013 15 2 Download

Syntaxbased translation models that operate on the output of a sourcelanguage parser have been shown to perform better if allowed to choose from a set of possible parses. In this paper, we investigate whether this is because it allows the translation stage to overcome parser errors or to override the syntactic structure itself. We ﬁnd that it is primarily the latter, but that under the right conditions, the translation stage does correct parser errors, improving parsing accuracy on the Chinese Treebank. ...
5p nghetay_1 07042013 26 1 Download

We present a novel translation model based on treetostring alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and nonterminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from wordaligned, source side parsed parallel texts. To translate a source sentence, we ﬁrst employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. ...
8p hongvang_1 16042013 21 1 Download

In this paper, we propose foresttostring rules to enhance the expressive power of treetostring translation models. A foresttostring rule is capable of capturing nonsyntactic phrase pairs by describing the correspondence between multiple parse trees and one string. To integrate these rules into treetostring translation models, auxiliary rules are introduced to provide a generalization level.
8p hongvang_1 16042013 35 1 Download

In an ordinary syntactic parser, the input is a string, and the grammar ranges over strings. This paper explores generalizations of ordinary parsing algorithms that allow the input to consist of string tuples and/or the grammar to range over string tuples. Such algorithms can infer the synchronous structures hidden in parallel texts. It turns out that these generalized parsers can do most of the work required to train and apply a syntaxaware statistical machine translation system.
8p bunbo_1 17042013 14 1 Download

Stochastic uniﬁcationbased grammars (SUBGs) deﬁne exponential distributions over the parses generated by a uniﬁcationbased grammar (UBG). Existing algorithms for parsing and estimation require the enumeration of all of the parses of a string in order to determine the most likely one, or in order to calculate the statistics needed to estimate a grammar from a training corpus.
8p bunmoc_1 20042013 14 1 Download

I describe a headdriven parser for a class of grammars that handle discontinuous constituency by a richer notion of string combination than ordinary concatenation. The parser is a generalization of the leftcorner parser (Matsumoto et al., 1983) and can be used for grammars written in powerful formalisms such as nonconcatenative versions of HPSG (Pollard, 1984; Reape, 1989).
8p bunmoc_1 20042013 19 1 Download

In this paper 1 we present a new parsing algorithm for linear indexed grammars (LIGs) in the same spirit as the one described in (VijayShanker and Weir, 1993) for tree adjoining grammars. For a LIG L and an input string x of length n, we build a non ambiguous contextfree grammar whose sentences are all (and exclusively) valid derivation sequences in L which lead to x. We show that this grammar can be built in (9(n 6) time and that individual parses can be extracted in linear time with the size of the extracted parse tree. Though this O(n...
8p bunmoc_1 20042013 19 1 Download

It is often r e m a r k e d that natural language, used naturally, is unnaturally ungrammatical.* Spontaneous speech contains all manner of false starts, hesitations, and selfcorrections that disrupt the wellformedness of strings. It is a mystery then, that despite this apparent wide deviation from grammatical norms, people have little difficx:lty understanding the nonfluent speech that is the essential medium of everyday life. A n d it is a still greater mystery that children can succeed in acquiring the g r a m m a r of a language on the basis of evidence provided by...
6p bungio_1 03052013 15 1 Download

We have analyzed definitions from Webster's Seventh New Collegiate Dictionary using Sager's Linguistic String Parser and again using basic UNIX text processing utilities such as grep and awk. Tiffs paper evaluates both procedures, compares their results, and discusses possible future lines of research exploiting and combining their respective strengths. Introduction As natural language systems grow more sophisticated, they need larger and more d ~ l e d lexicons.
8p bungio_1 03052013 29 1 Download

In this paper we present a polynomial time parsing algorithm for Combinatory Categorial Grammar. The recognition phase extends the CKY algorithm for CFG. The process of generating a representation of the parse trees has two phases. Initially, a shared forest is build that encodes the set of all derivation trees for the input string. This shared forest is then pruned to remove all spurious ambiguity.
8p bungio_1 03052013 17 1 Download

In the literature, Tree Adjoining Grammars (TAGs) are propagated to be adequate for natural language description   analysis as well as generation. In this paper we concentrate on the direction of analysis. Especially important for an implementation of that task is how efficiently this can be done, i.e., how readily the word problem can be solved for TAGs. Up to now, a parser with O(n 6) steps in the worst case was known where n is the length of the input string. In this paper, the result is improved to O(n 4 log n) as a new lowest...
8p bungio_1 03052013 15 1 Download

It is a tacit assumption of m u c h linguistic inquiry that all distinct derivations of a string should assign distinct meanings. But despite the tidiness of such derivational uniqueness, there seems to be no a priori reason to assume that a g r a m m a r must have this property. If a grammar exhibits derivational equivalence, whereby distinct derivations of a string assign the same meanings, naive exhaustive search for all derivations will be redundant, and quite possibly intractable. In this paper we show how notions of derivationreduction and normal form can be used to...
9p buncha_1 08052013 6 1 Download