Discourse analysis considers how language, both spoken and written, enacts
social and cultural perspectives and identities. Assuming no prior knowledge
of linguistics, An Introduction to Discourse Analysis examines the field and
presents James Paul Gee’s unique integrated approach, which incorporates
both a theory of language-in-use and a method of research.
Software to translate English text into American Sign Language (ASL) animation can improve information accessibility for the majority of deaf adults with limited English literacy. ASL natural language generation (NLG) is a special form of multimodal NLG that uses multiple linguistic output channels. ASL NLG technology has applications for the generation of gesture animation and other communication signals that are not easily encoded as text strings.
Mobile interfaces need to allow the user and system to adapt their choice of communication modes according to user preferences, the task at hand, and the physical and social environment. We describe a multimodal application architecture which combines ﬁnite-state multimodal language processing, a speech-act based multimodal dialogue manager, dynamic multimodal output generation, and user-tailored text planning to enable rapid prototyping of multimodal interfaces with ﬂexible input and adaptive output. ...
The growing popularity of multimedia documents requires language technologies to approach automatic language analysis and generation from yet another perspective: that of its use in multimodal communication. In this paper, we present a support tool for COSMOROE, a theoretical framework for modelling multimedia dialectics. The tool is a text-based search interface that facilitates the exploration of a corpus of audiovisual ﬁles, annotated with the COSMOROE relations.
Multimodal interfaces combining, e.g., natural language and graphics take advantage of both the individual strength of each communication mode and the fact that several modes can be employed in parallel, e.g., in the text-picture combinations of illustrated documents. It is an important goal of this research not simply to merge the verbalization results of a natural language generator and the visualization results of a knowledge-based graphics generator, but to carefully coordinate graphics and text in such a way that they complement each other. ...
We discuss Image Sense Discrimination (ISD), and apply a method based on spectral clustering, using multimodal features from the image and text of the embedding web page. We evaluate our method on a new data set of annotated web images, retrieved with ambiguous query terms. Experiments investigate different levels of sense granularity, as well as the impact of text and image features, and global versus local text features.
We describe how context-sensitive, usertailored output is speciﬁed and produced in the COMIC multimodal dialogue system. At the conference, we will demonstrate the user-adapted features of the dialogue manager and text planner. three-dimensional walkthrough of the ﬁnished bathroom. We will focus on how context-sensitive, usertailored output is generated in the third, guidedbrowsing phase of the interaction. Figure 2 shows a typical user request and response from COMIC in this phase.
In this talk, we will, show how techniques for planning text and discourse can be generalized to plan the structure and content of multimodal communications, that integrate natural language, pointing, graphics, and animations. The central claim of this talk is that the generation of multimodal discourse can be considered as an incremental planning process that aims to achieve a given communicative goal.
This collective work deals with the analysis of audiovisual numerical texts or
corpora, which may e.g. form part of an audiovisual library or archive.
The development of methods, tools and conceptual frameworks (or models) for
the concrete analysis of audiovisual texts or corpora is one of the most important
issues for multimedia (audiovisual) digital libraries, archives, collections, etc. and
also for any project or program to compile and disseminate knowledge heritage
(e.g. cultural, scientific etc.).
This paper presents a probabilistic framework that combines multiple knowledge sources for Haptic Voice Recognition (HVR), a multimodal input method designed to provide efﬁcient text entry on modern mobile devices. HVR extends the conventional voice input by allowing users to provide complementary partial lexical information via touch input to improve the efﬁciency and accuracy of voice recognition.
This paper describes the NECA MNLG; a fully implemented Multimodal Natural Language Generation module. The MNLG is deployed as part of the NECA system which generates dialogues between animated agents. The generation module supports the seamless integration of full grammar rules, templates and canned text. The generator takes input which allows for the specification of syntactic, semantic and pragmatic constraints on the output.
This paper explores the relationship between discourse segmentation and coverbal gesture. Introducing the idea of gestural cohesion, we show that coherent topic segments are characterized by homogeneous gestural forms and that changes in the distribution of gestural features predict segment boundaries. Gestural features are extracted automatically from video, and are combined with lexical features in a Bayesian generative model. The resulting multimodal system outperforms text-only segmentation on both manual and automaticallyrecognized speech transcripts. ...