  • In this study, a novel approach to robust dialogue act detection for error-prone speech recognition in a spoken dialogue system is proposed. First, partial sentence trees are proposed to represent a speech recognition output sentence. Semantic information and the derivation rules of the partial sentence trees are extracted and used to model the relationship between the dialogue acts and the derivation rules.

  • The present paper describes a robust approach for abbreviating terms. First, in order to incorporate non-local information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model, or alternatively, the label encoding approach with global information. Although the two approaches compete with one another, we demonstrate that these approaches are also complementary. By combining these two approaches, experiments revealed that the proposed abbreviation generator achieved the best results for both the Chinese and English languages. ...

  • We describe the design and function of a robust processing component which is being developed for the Verbmobil speech translation system. Its task consists of collecting partial analyses of an input utterance produced by three parsers and attempting to combine them into more meaningful, larger units.

  • We describe a novel method for coping with ungrammatical input based on the use of chart-like data structures, which permit anytime processing. Priority is given to deep syntactic analysis. Should this fail, the best partial analyses are selected, according to a shortest-paths algorithm, and assembled in a robust processing phase. The m e t h o d has been applied in a speech translation project with large HPSG grammars.

  • Existing word similarity measures are not robust to data sparseness since they rely only on the point estimation of words’ context profiles obtained from a limited amount of data. This paper proposes a Bayesian method for robust distributional word similarities. The method uses a distribution of context profiles obtained by Bayesian estimation and takes the expectation of a base similarity measure under that distribution.

  • We present a pointwise approach to Japanese morphological analysis (MA) that ignores structure information during learning and tagging. Despite the lack of structure, it is able to outperform the current state-of-the-art structured approach for Japanese MA, and achieves accuracy similar to that of structured predictors using the same feature set. We also find that the method is both robust to outof-domain data, and can be easily adapted through the use of a combination of partial annotation and active learning. ...

  • Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap.

  • We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations (Cahill et al., 2004) automatically extracted from treebanks, maximising the probability of a tree given an f-structure. We evaluate our approach using stringbased evaluation. We currently achieve coverage of 95.26%, a BLEU score of 0.7227 and string accuracy of 0.7476 on the Penn-II WSJ Section 23 sentences of length ≤20. grammar for generation.

  • We describe the use of XML tokenisation, tagging and mark-up tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon.

  • In this paper we present results on developing robust natural language interfaces by combining shallow and partial interpretation with dialogue management. The key issue is to reduce the effort needed to adapt the knowledge sources for parsing and interpretation to a necessary minimum. In the paper we identify different types of information and present corresponding computational models. The approach utilizes an automatically generated lexicon which is updated with information from a corpus of simulated dialogues. The grammar is developed manually from the same knowledge sources. ...

  • Most traditional approaches to anaphora resolution rely heavily on linguistic and domain knowledge. One of the disadvantages of developing a knowledgebased system, however, is that it is a very labourintensive and time-consuming task. This paper presents a robust, knowledge-poor approach to resolving pronouns in technical manuals, which operates on texts pre-processed by a part-of-speech tagger.

  • We discuss an interactive approach to robust interpretation in a large scale speech-to-speech translation system. Where other interactive approaches to robust interpretation have depended upon domain dependent repair rules, the approach described here operates efficiently without any such hand-coded repair knowledge and yields a 37% reduction in error rate over a corpus of noisy sentences.

  • This paper describes new and improved techniques which help a unification-based parser to process input efficiently and robustly. In combination these methods result in a speed-up in parsing time of more than an order of magnitude. The methods are correct in the sense that none of them rule out legal rule applications. and Sch~ifer, 1994; Krieger and Sch~ifer, 1995) and an advanced agenda-based bottom-up chart parser (Kiefer and Scherf, 1996).

  • We developed a prototype information retrieval system which uses advanced natural language processing techniques to enhance the effectiveness of traditional key-word based document retrieval. The backbone of our system is a statistical retrieval engine which performs automated indexing of documents, then search and ranking in response to user queries. This core architecture is augmented with advanced natural language processing tools which are both robust and efficient.

  • In this work, we present an experimental analysis of a Dialogue System for the automatization of simple telephone services. Starting from the evaluation of a preliminar version of the system we 1 conclude the necessity to desing a robust and flexible system suitable to have to have different dialogue control strategies depending on the characteristics of the user and the performance of the speech recognition module. Experimental results following the PARADISE framework show an important improvement both in terms of task success and dialogue cost for the proposed system. ...

  • We propose an improved, bottom-up method for converting CCG derivations into PTB-style phrase structure trees. In contrast with past work (Clark and Curran, 2009), which used simple transductions on category pairs, our approach uses richer transductions attached to single categories. Our conversion preserves more sentences under round-trip conversion (51.1% vs. 39.6%) and is more robust.

  • This paper describes a series of experiments to test the hypothesis that the parallel application of multiple NLP tools and the integration of their results improves the correctness and robustness of the resulting analysis. It is shown how annotations created by seven NLP tools are mapped onto toolindependent descriptions that are defined with reference to an ontology of linguistic annotations, and how a majority vote and ontological consistency constraints can be used to integrate multiple alternative analyses of the same token in a consistent way. ...

  • This paper presents an empirical study on the robustness and generalization of two alternative role sets for semantic role labeling: PropBank numbered roles and VerbNet thematic roles. By testing a state–of–the–art SRL system with the two alternative role annotations, we show that the PropBank role set is more robust to the lack of verb–specific semantic information and generalizes better to infrequent and unseen predicates. Keeping in mind that thematic roles are better for application needs, we also tested the best way to generate VerbNet annotation. ...

  • This work presents an agenda-based approach to improve the robustness of the dialog manager by using dialog examples and n-best recognition hypotheses. This approach supports n-best hypotheses in the dialog manager and keeps track of the dialog state using a discourse interpretation algorithm with the agenda graph and focus stack. Given the agenda graph and n-best hypotheses, the system can predict the next system actions to maximize multi-level score functions. To evaluate the proposed method, a spoken dialog system for a building guidance robot was developed.

  • We propose a robust method of automatically constructing a bilingual word sense dictionary from readily available monolingual ontologies by using estimation-maximization, without any annotated training data or manual tuning. We demonstrate our method on the English FrameNet and Chinese HowNet structures. Owing to the robustness of EM iterations in improving translation likelihoods, our word sense translation accuracies are very high, at 82% on average, for the 11 most ambiguous words in the English FrameNet with 5 senses or more....

