This paper proposes an efﬁcient method of sentence retrieval based on syntactic structure. Collins proposed Tree Kernel to calculate structural similarity. However, structual retrieval based on Tree Kernel is not practicable because the size of the index table by Tree Kernel becomes impractical. We propose more efﬁcient algorithms approximating Tree Kernel: Tree Overlapping and Subpath Set.
We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features.
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published ﬁgures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust cross-linguistically, being able to exploit either attachment or distributional regularities that are salient in the data. ...
This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random ﬁeld paradigm.
We present two approaches for syntactic and semantic transfer based on LFG f-structures and compare the results with existing co-description and restriction operator based approaches, focusing on aspects of ambiguity preserving transfer, complex cases of syntactic structural mismatches as well as on modularity and reusability. The two transfer approaches are interfaced with an existing, implemented transfer component (Verbmobi1), by translating f-structures into a term language, and by interfacing fstructure representations with an existing semantic based transfer approach, respectively. ...
Recent work by Nerbonne and Wiersma (2006) has provided a foundation for measuring syntactic differences between corpora. It uses part-of-speech trigrams as an approximation to syntactic structure, comparing the trigrams of two corpora for statistically signiﬁcant differences. This paper extends the method and its application. It extends the method by using leafpath ancestors of Sampson (2000) instead of trigrams, which capture internal syntactic structure—every leaf in a parse tree records the path back to the root.
This paper surveys some issues that arise in the study of the syntax and semantics of natural languages (NL's) and have potential relevance to the automatic recognition, parsing, and translation of NL's. An attempt is made to take into account the fact that parsing is scarcely ever thought about with reference to syntax alone; semantic ulterior motives always underly the assignment of a syntactic structure to a sentence.
This report describes the development of a parsing system for written Swedish and is focused on a grammar, the main component of the system, semiautomatically extracted from corpora. A cascaded, finite-state algorithm is applied to the grammar in which the input contains coarse-grained semantic class information, and the output produced reflects not only the syntactic structure of the input, but grammatical functions as well. The grammar has been tested on a variety of random samples of different text genres, achieving precision and recall of 94.62% and 91.
Convolution kernels support the modeling of complex syntactic information in machinelearning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task.
The paper discusses three different kinds of syntactic ill-formedness: ellipsis, conjunctions, and actual syntactic errors. It is shown how a new grammatical formalism, based on a two-level repr_e sentation of the syntactic knowledge is used to cope with Ill-formed sentences. The basic control struc ture of the parser is briefly sketched; the paper shows that it can be applied without any substan tial change both to correct and to ill-formed sen tences.
The structure imposed upon spoken sentences by intonation seems frequently to be orthogohal to their traditional surface-syntactic structure. However, the notion of "intonational structure" as formulated by Pierrehumbert, Selkirk, and others, can be subsumed under a rather different notion of syntactic surface structure that emerges from a theory of grammar based on a "Combinatory" extension to Categorial Gram, mar.
The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint sequence of words-binary-parse-structure with headword annotation. The model, its probabilistic parametrization, and a set of experiments meant to evaluate its predictive power are presented.
If natural language had been designed by a logician, idioms would not exist.
They are a feature of discourse that frustrates any simple logical account of how
the meanings of utterances depend on the meanings of their parts and on the
syntactic relation among those parts. Idioms are transparent to native speakers,
but a course of perplexity to those who are acquiring a second language. If
someone tells me that Mrs. Thatcher has become the Queen of Scotland, I am
likely to say: "That's a tall story.
Morphological processes in Semitic languages deliver space-delimited words which introduce multiple, distinct, syntactic units into the structure of the input sentence. These words are in turn highly ambiguous, breaking the assumption underlying most parsers that the yield of a tree for a given sentence is known in advance. Here we propose a single joint model for performing both morphological segmentation and syntactic disambiguation which bypasses the associated circularity.
We study the impact of syntactic and shallow semantic information in automatic classiﬁcation of questions and answers and answer re-ranking. We deﬁne (a) new tree structures based on shallow semantics encoded in Predicate Argument Structures (PASs) and (b) new kernel functions to exploit the representational power of such structures with Support Vector Machines. Our experiments suggest that syntactic information helps tasks such as question/answer classiﬁcation and that shallow semantics gives remarkable contribution when a reliable set of PASs can be extracted, e.g. from answers.
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure.
Adequate mechanical translation can be based only on adequate structural descriptions of the languages involved and on an adequate statement of equivalences. Translation is conceived of as a three-step process: recognition of the structure of the incoming text in terms of a structural specifier; transfer of this specifier into a structural specifier in the other language; and construction to order of the output text specified.
Developing features has been shown crucial to advancing the state-of-the-art in Semantic Role Labeling (SRL). To improve Chinese SRL, we propose a set of additional features, some of which are designed to better capture structural information. Our system achieves 93.49 Fmeasure, a signiﬁcant improvement over the best reported performance 92.0. We are further concerned with the effect of parsing in Chinese SRL. We empirically analyze the two-fold effect, grouping words into constituents and providing syntactic information. ...
The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach.
Spontaneously produced speech text often includes disﬂuencies which make it difﬁcult to analyze underlying structure. Successful reconstruction of this text would transform these errorful utterances into ﬂuent strings and offer an alternate mechanism for analysis. Our investigation of naturally-occurring spontaneous speaker errors aligned to corrected text with manual semanticosyntactic analysis yields new insight into the syntactic and structural semantic differences between spoken and reconstructed language. ...