  • This paper reports the development of loglinear models for the disambiguation in wide-coverage HPSG parsing. The estimation of log-linear models requires high computational cost, especially with widecoverage grammars. Using techniques to reduce the estimation cost, we trained the models using 20 sections of Penn Treebank. A series of experiments empirically evaluated the estimation techniques, and also examined the performance of the disambiguation models on the parsing of real-world sentences. ...

  • We present a new approach to stochastic modeling of constraintbased grammars that is based on loglinear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison to training from a parsebank shows a 10% gain from EM training.

  • In this paper we present a human-based evaluation of surface realisation alternatives. We examine the relative rankings of naturally occurring corpus sentences and automatically generated strings chosen by statistical models (language model, loglinear model), as well as the naturalness of the strings chosen by the log-linear model. We also investigate to what extent preceding context has an effect on choice. We show that native speakers do accept quite some variation in word order, but there are also clearly factors that make certain realisation alternatives more natural. ...

  • We investigate the influence of information status (IS) on constituent order in German, and integrate our findings into a loglinear surface realisation ranking model. We show that the distribution of pairs of IS categories is strongly asymmetric. Moreover, each category is correlated with morphosyntactic features, which can be automatically detected. We build a loglinear model that incorporates these asymmetries for ranking German string realisations from input LFG F-structures.

  • Paraphrase patterns are useful in paraphrase recognition and generation. In this paper, we present a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the English paraphrase patterns are extracted using the sentences in a foreign language as pivots. We propose a loglinear model to compute the paraphrase likelihood of two patterns and exploit feature functions based on maximum likelihood estimation (MLE) and lexical weighting (LW).

  • In Semantic Role Labeling (SRL), it is reasonable to globally assign semantic roles due to strong dependencies among arguments. Some relations between arguments significantly characterize the structural information of argument structure. In this paper, we concentrate on thematic hierarchy that is a rank relation restricting syntactic realization of arguments. A loglinear model is proposed to accurately identify thematic rank between two arguments.

  • Despite much recent progress on accurate semantic role labeling, previous work has largely used independent classifiers, possibly combined with separate label sequence models via Viterbi decoding. This stands in stark contrast to the linguistic observation that a core argument frame is a joint structure, with strong dependencies between arguments. We show how to build a joint model of argument frames, incorporating novel features that model these interactions into discriminative loglinear models. ...

  • In this paper we present several extensions of MARIE1 , a freely available N -gram-based statistical machine translation (SMT) decoder. The extensions mainly consist of the ability to accept and generate word graphs and the introduction of two new N -gram models in the loglinear combination of feature functions the decoder implements. Additionally, the decoder is enhanced with a caching strategy that reduces the number of N -gram calls improving the overall search efficiency. Experiments are carried out over the Eurpoean Parliament Spanish-English translation task. ...

