Lexical structure

  • This textbook, like all textbooks, was born of necessity. When I went looking for a suitable textbook for my course on Lexical-Functional Grammar at the Hebrew University of Jerusalem, I discovered that there wasn’t one. So I decided to write one, based on my lecture notes. The writing accelerated when, while I was on sabbatical at Stanford University (August 1999– February 2000), Dikran Karagueuzian of CSLI Publications expressed interest in publishing it.

  • Sentiment analysis of citations in scientific papers and articles is a new and interesting problem due to the many linguistic differences between scientific texts and other genres. In this paper, we focus on the problem of automatic identification of positive and negative sentiment polarity in citations to scientific papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-specific lexical features, dependency relations, sentence splitting and negation features. ...

  • This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm.

  • We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies.

  • In this paper, we develop a methodology for discovering the thematic structure of the Qur’an based on a fundamental idea in data mining and related disciplines: that, with respect to some collection of texts, the lexical frequency profiles of the individual texts are a good indicator of their conceptual content, and thus provide a reliable criterion for their classification relative to one another.

  • This paper proposes a method for packing feature structures, which automatically collapses equivalent parts of lexical/phrasal feature structures of HPSG into a single packed feature structure. This method avoids redundant repetition of unification of those parts. Preliminary experiments show that this method can significantly improve a unification speed in parsing.

  • Verbal and compositional lexical aspect provide the underlying temporal structure of events. Knowledge of lexical aspect, e.g., (a)telicity, is therefore required for interpreting event sequences in discourse (Dowty, 1986; Moens and Steedman, 1988; Passoneau, 1988), interfacing to temporal databases (Androutsopoulos, 1996), processing temporal modifiers (Antonisse, 1994), describing allowable alternations and their semantic effects (Resnik, 1996; Tenny, 1994), and selecting tense and lexical items for natural language generation ((Dorr and Olsen, 1996; Klavans and Chodorow, 1992), cf.

  • Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. ...

  • This paper introduces a machine learning method based on bayesian networks which is applied to the mapping between deep semantic representations and lexical semantic resources. A probabilistic model comprising Minimal Recursion Semantics (MRS) structures and lexicalist oriented semantic features is acquired. Lexical semantic roles enriching the MRS structures are inferred, which are useful to improve the accuracy of deep semantic parsing.

  • This paper describes a reader-based experiment on lexical cohesion, detailing the task given to readers and the analysis of the experimental data. We conclude with discussion of the usefulness of the data in future research on lexical cohesion. Cohesive ties between items in a text draw on the resources of a language to build up the text’s unity (Halliday and Hasan, 1976). Lexical cohesive ties draw on the lexicon, i.e. word meanings.

  • We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP hypothesis which states that parse accuracy increases with increasing fragment size is confirmed for LFG-DOP; (3) LFGDOP's relative frequency estimator performs worse than a discounted frequency estimator; and (4) LFG-DOP significantly outperforms TreeDOP if evaluated on tree structures only.

  • In wide-coverage lexicalized grammars many of the elementary structures have substructures in common. This means that in conventional parsing algorithms some of the computation associated with different structures is duplicated. In this paper we describe a precompilation technique for such grammars which allows some of this computation to be shared. In our approach the elementary structures of the grammar are transformed into finite state automata which can be merged and minimised using standard algorithms, and then parsed using an automatonbased parser. ...

  • As a lexical knowledge base constructed automatically from the definitions and example sentences in two machine-readable dictionaries (MRDs), MindNet embodies several features that distinguish it from prior work with MRDs. It is, however, more than this static resource alone. MindNet represents a general methodology for acquiring, structuring, accessing, and exploiting semantic information from natural language text.

  • This paper shows how DATR, a widely used formal language for lexical knowledge representation, can be used to define an I_TAG lexicon as an inheritance hierarchy with internal lexical rules. A bottom-up featural encoding is used for LTAG trees and this allows lexical rules to be implemented as covariation constraints within feature structures. Such an approach eliminates the considerable redundancy otherwise associated with an LTAG lexicon.

  • This paper describes a structure-sharing method for the representation of complex phrase types in a parser for PATR-[I, a unification-based g r a m m a r formalism. In parsers for unification-based grammar formalisms, complex phrase types are derived by incremental refinement of rite phrase types defined in grammar rules and lexical entries. In a naive implementation, a new phrase type is built by copying older ones and then combining the copies according to the constraints stated in a grammar rule. ...

  • Taking examples from English and French idioms, this paper shows that not only constituent structures rules but also most syntactic rules (such as topicalization, wh-question, pronominalization ...) are subject to lexical constraints (on top of syntactic, and possibly semantic, ones). We show that such puzzling phenomena are naturally handled in a 'lexJcalized' formalism such as Tree Adjoining Grammar. The extended domain of locality of TAGs also allows one to 'lexicalize' syntactic rules while defining them at the level of constituent structures. ...

  • Topological Dependency Grammar (TDG) is a lexicalized dependency grammar formalism, able to model languages with a relatively free word order. In such languages, word order variation often has an important function: the realization of information structure. The paper discusses how to integrate information structure into TDG, and presents a constraint-based approach to modelling information structure and the various means to realize it, focusing on (possibly simultaneous use of) word order and tune. ...

  • Probabilistic accounts of language processing can be psychologically tested by comparing word-reading times (RT) to the conditional word probabilities estimated by language models. Using surprisal as a linking function, a significant correlation between unlexicalized surprisal and RT has been reported (e.g., Demberg and Keller, 2008), but success using lexicalized models has been limited. In this study, phrase structure grammars and recurrent neural networks estimated both lexicalized and unlexicalized surprisal for words of independent sentences from narrative sources. ...

  • Widely accepted resources for semantic parsing, such as PropBank and FrameNet, are not perfect as a semantic role labeling framework. Their semantic roles are not strictly defined; therefore, their meanings and semantic characteristics are unclear. In addition, it is presupposed that a single semantic role is assigned to each syntactic argument. This is not necessarily true when we consider internal structures of verb semantics. We propose a new framework for semantic role annotation which solves these problems by extending the theory of lexical conceptual structure (LCS). ...

  • I will argue in this paper that the standard notions of a f f e c t e d n e s s , change-of-state and result state are too coarse-grained, and will revise and enrich substantially their content, increasing their role in a compositional aspect construal procedure. I will claim in particular that a proper theory of event structure requires that enriched result states should be lexically represented, and will base on them a computational treatment of event structure within a feature-structure-based lexicon. ...

