In this paper, we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We ﬁrst apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription.
Chapter 1 presents IVUS. Intravascular ultrasound images represent a unique
tool to guide interventional coronary procedures; this technique allows to
supervise the cross-sectional locations of the vessel morphology and to provide
quantitative and qualitative information about the causes and severity of
coronary diseases. At the moment, the automatic extraction of this kind of information
is performed without taking into account the basic signal principles
that guide the process of image generation....
In Chapter 1 we present in detail a framework for fully automated brain tissue
classification. The framework consists of a sequence of fully automated state
of the art image registration (both rigid and nonrigid) and image segmentation
algorithms. Models of the spatial distribution of brain tissues are combined with
models of expected tissue intensities, including correction of MR bias fields and
estimation of partial voluming. We also demonstrate how this framework can
be applied in the presence of lesions....
Hindawi Publishing Corporation EURASIP Journal on Image and Video Processing Volume 2010, Article ID 814319, 12 pages doi:10.1155/2010/814319
Research Article Automatic Segmentation and Inpainting of Specular Highlights for Endoscopic Imaging
Mirko Arnold, Anarta Ghosh, Stefan Ameling, and Gerard Lacey
School of Computer Science and Statistics, Trinity College, Dublin, Ireland Correspondence should be addressed to Anarta Ghosh, email@example.com.
In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated with each of them.
Automatic segmentation is important for making multimedia archives comprehensible, and for developing downstream information retrieval and extraction modules. In this study, we explore approaches that can segment multiparty conversational speech by integrating various knowledge sources (e.g., words, audio and video recordings, speaker intention and context). In particular, we evaluate the performance of a Maximum Entropy approach, and examine the effectiveness of multimodal features on the task of dialogue segmentation. ...
After completing this lesson, you should be able to do the following:
Describe the concept of automaticundo management
Create and maintain the automatic managedundo tablespace
Set the retention period
Use dynamic performance views to check rollback segment performance
Reconfigure and monitor rollback segments
Define the number and sizes of rollback segments
Allocate rollback segments to transactions
Then, when a segment containing the prime in less than minimal combination is presented for identification, its location in cue space lies within a restricted number of units of within-cluster variance of the central location of the prime cluster. The number of such distance units determines headedness in the segment, with separate thresholds for occurrence as head and as operator. In § 3 we describe in more detail the stagewise procedure for identifying via quadratic discriminants the primes present in segments. ...
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently annotated corpus to the corpus with desired annotation.
Machine translation (SMT), it can happen that the most accurate word segmentation as judged by the human gold-standard segmentation may not produce the best translation output (Zhang et al., 2008). While state-of-the-art Chinese word segmenters achieve high accuracy, some errors still remain.
Identiﬁcation of transliterated names is a particularly difﬁcult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named entities, the difference between PRC and Taiwan is the most prevalent and most challenging. In this paper, we introduce a novel approach to the automatic extraction of diverging transliterations of foreign named entities by bootstrapping cooccurrence statistics from tagged and segmented Chinese corpus. Preliminary experiment yields promising results and shows its potential in NLP applications. ...
To segment texts in thematic units, we present here how a basic principle relying on word distribution can be applied on different kind of texts. We start from an existing method well adapted for scientific texts, and we propose its adaptation to other kinds of texts by using semantic links between words. These relations are found in a lexical network, automatically built from a large corpus. We will compare their results and give criteria to choose the more suitable method according to text characteristics. ...
A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects. ...
In general, a certain range of sentences in a text, is widely assumed to form a coherent unit which is called a discourse segment. Identifying the segment boundaries is a first step to recognize the structure of a text. In this paper, we describe a method for identifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues, though our experiments might be small-scale. We also present a method of training the weights for multiple linguistic cues automatically without the overfitting problem. ...
Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use of any lexicon and hand-crafted linguistic resource. The statistical data required by the algorithm, that is, mutual information and the difference of t-score between characters, is derived automatically from raw Chinese corpora. The preliminary experiment shows that the segmentation accuracy of our algorithm is acceptable.
This paper explores the relationship between discourse segmentation and coverbal gesture. Introducing the idea of gestural cohesion, we show that coherent topic segments are characterized by homogeneous gestural forms and that changes in the distribution of gestural features predict segment boundaries. Gestural features are extracted automatically from video, and are combined with lexical features in a Bayesian generative model. The resulting multimodal system outperforms text-only segmentation on both manual and automaticallyrecognized speech transcripts. ...
This paper presents a Bayesian decision framework that performs automatic story segmentation based on statistical modeling of one or more lexical chain features. Automatic story segmentation aims to locate the instances in time where a story ends and another begins. A lexical chain is formed by linking coherent lexical items chronologically. A story boundary is often associated with a significant number of lexical chains ending before it, starting after it, as well as a low count of chains continuing through it.
We investigate different feature sets for performing automatic sentence-level discourse segmentation within a general machine learning approach, including features derived from either ﬁnite-state or contextfree annotations. We achieve the best reported performance on this task, and demonstrate that our SPADE-inspired context-free features are critical to achieving this level of accuracy. This counters recent results suggesting that purely ﬁnite-state approaches can perform competitively.
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation algorithm uses automatically induced decision rules to combine the different features. The embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information.
This article outlines a new method of locating discourse boundaries based on lexical cohesion and a graphical technique called dotplotting. The application of dotplotting to discourse segmentation can be performed either manually, by examining a graph, or automatically, using an optimization algorithm. The results of two experiments involving automatically locating boundaries between a series of concatenated documents are presented. Areas of application and future directions for this work are also outlined. Introduction In general, texts are "about" some topic. ...