The process of language generation

  • In this paper we argue that lexicalselection plays a more important role in the generation process than has commonly been assumed. To stress the importance of lexicalsemantic input to generation, we explore the distinction and treatment of generating open and closed cla~s lexical items, and suggest an additional classification of the latter into discourse-oriented and proposition-oriented items. Finally, we discuss how lexical selection is influenced by thematic ([oc~) information in the input. ...

  • This paper presents an analysis of purpose clauses in the context of instruction understanding. Such analysis shows that goals affect the interpretation and / or execution of actions, lends support to the proposal of using generation and enablement to model relations between actions, and sheds light on some inference processes necessary to interpret purpose clauses. which, as its name says, expresses the agent's purpose in performing a certain action. The analysis of purpose clauses is relevant to the problem of understanding Natural Language instructions, because: ...

  • In this paper we examine the pragmatic knowledge an utterance-planning system must have in order to produce certain kinds of definite and indefinite noun phrases. An utterance.planning system, like other planning systems, plans actions to satisfy an agent's goals, but allows some of the actions to consist of the utterance of sentences. This approach to language generation emphasizes the view of language as action, and hence assigns a critical role to pragmatics.

  • We have analyzed definitions from Webster's Seventh New Collegiate Dictionary using Sager's Linguistic String Parser and again using basic UNIX text processing utilities such as grep and awk. Tiffs paper evaluates both procedures, compares their results, and discusses possible future lines of research exploiting and combining their respective strengths. Introduction As natural language systems grow more sophisticated, they need larger and more d ~ l e d lexicons.

  • In this paper we describe a framework for research into translation that draws on a combination of two existing and independently constructed technologies: an analysis component developed for German by the EUROTRA-D (ET-D) group of IAI and the generation component developed for English by the Penman group at ISI. We present some of the linguistic implications of the research and the promise it bears for furthering understanding of the translation process.

  • Collocational knowledge is necessary for language generation. The problem is that collocations come in a large variety of forms. They can involve two, three or more words, these words can be of different syntactic categories and they can be involved in more or less rigid ways. This leads to two main difficulties: collocational knowledge has to be acquired and it must be represented flexibly so that it can be used for language generation. We address both problems in this paper, focusing on the acquisition problem. We describe a program, X t r a c t , that automatically acquires...

  • Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. ...

  • The task of discourse generation is to produce multisentential text in natural language which (when heard or read) produces effects (informing, motivating, etc.) and impressions (conciseness, correctness, ease of reading, etc.) which are appropriate to a need or goal held by the creator of the text. Because even little children can produce multieententiaJ text, the task of discourse generation appears deceptively easy. It is actually extremely complex, in part because it usually involves many different kinds of knowledge.

  • We notion of argue that in domains can be where it a strong can be Mann and Moore [1981], on the other hand, while assembling texts dynamically to suit their audience, do so by "over-generating" the set of facts that will be related, and then passing them all through a special filter, leaving out those that are judged to be already known to the audience and letting through those that are new. McKeown [1981] uses a similar technique -- her generator, like Mann and Moore's, must examine every potentially mentionable object in the domain data base and make...

  • Starting from the assumption that machine translation (MT) should be based on theoretically s o u n d grounds, we argue that, given the state of the a r t , the only v i a b l e solution for the designer of software tools for MT, is to provide the linguists building the MT system with a generator of highly specialized, problem oriented systems. We propose that such theory sensitive systems be generated automatically by supplying a set of definitions to a kernel software, of which we give an informal description in the paper. We give...

  • This paper describes the NECA MNLG; a fully implemented Multimodal Natural Language Generation module. The MNLG is deployed as part of the NECA system which generates dialogues between animated agents. The generation module supports the seamless integration of full grammar rules, templates and canned text. The generator takes input which allows for the specification of syntactic, semantic and pragmatic constraints on the output.

  • We present and evaluate a new model for Natural Language Generation (NLG) in Spoken Dialogue Systems, based on statistical planning, given noisy feedback from the current generation context (e.g. a user and a surface realiser). We study its use in a standard NLG problem: how to present information (in this case a set of search results) to users, given the complex tradeoffs between utterance length, amount of information conveyed, and cognitive load. We set these trade-offs by analysing existing MATCH data.

  • We demonstrate an open-source natural language generation engine that produces descriptions of entities and classes in English and Greek from OWL ontologies that have been annotated with linguistic and user modeling information expressed in RDF . We also demonstrate an accompanying plug-in for the Prot´ g´ ontology editor, e e which can be used to create the ontology’s annotations and generate previews of the resulting texts by invoking the generation engine.

  • Natural language generation (NLG) systems are notoriously hard to evaluate. On the one hand, simply comparing system outputs to a gold standard is not appropriate because there can be multiple generated outputs that are equally good, and finding metrics that account for this variability and produce results consistent with human judgments and task performance measures is difficult (Belz and Gatt, 2008; Stent et al., 2005; Foster, 2008). On the other hand, lab-based evaluations with human subjects to assess each aspect of the system’s functionality are expensive and time-consuming. ...

  • This paper presents a method for the automatic extraction of subgrammars to control and speeding-up natural language generation NLG. The method is based on explanation-based learning EBL. The main advantage for the proposed new method for NLG is that the complexity of the grammatical decision making process during NLG can be vastly reduced, because the EBL method supports the adaption of a NLG system to a particular use of a language.

  • This paper is part of an MSc. report on a program called GENIE (Generator of Inflected English), written in CProlog, that acts as a front end to an existing speech synthesis program. It allows the user to type a sentence in English text, and then processes it so that the synthesiser will output it with natural-sounding inflection; that is, as well as transcribing text to a phonemic form that can be read by the system, it assigns this text an fO contour. The assigning of this stress is described in this paper, and it is asserted that the...

  • We present a proposal for the structuring of collocation knowledge 1 in the lexicon of a multilingual generation system and show to what extent it can be used in the process of lexical selection. This proposal is part of Polygloss, a new research project on multilingual generation, and it has been inspired by work carried out in the S EMSYN project (see e.g. [I~(~SNEtt 198812). The descriptive approach presented in this proposal is based on a combination of results from recent lexicographical research and the application of Meaning-Text-Theory (MTT) (see e.g. [MEL'CUK et al.

  • We propose WIDL-expressions as a flexible formalism that facilitates the integration of a generic sentence realization system within end-to-end language processing applications. WIDL-expressions represent compactly probability distributions over finite sets of candidate realizations, and have optimal algorithms for realization via interpolation with language model probability distributions. We show the effectiveness of a WIDL-based NLG system in two sentence realization tasks: automatic translation and headline generation. ...

  • Computer-based generation of natural language requires consideration of two different types of problems: i) determining the content and textual shape of what is to be said, and 2) transforming that message into English. A computational solution to the problems of deciding what to say and how to organize it effectively is proposed that relies on an interaction between structural and semantic processes. Schemas, which encode aspects of discourse structure, are used to guide the generation process.

  • The knowledge representation is an important factor in natural language generation since it limits the semantic capabilities of the generation system. This paper identifies several information types in a knowledge representation that can be used to generate meaningful responses to questions about database structure. Creating such a knowledge representation, however, is a long and tedious process. A system is presented which uses the contents of the database to form part of this knowledge representation automatically. ...

