An approach to automatic detection of syllable boundaries is presented. We demonstrate the use of several manually constructed grammars trained with a novel algorithm combining the advantages of treebank and bracketed corpora training. We investigate the effect of the training corpus size on the performance of our system. The evaluation shows that a hand-written grammar performs better on ﬁnding syllable boundaries than does a treebank grammar.
In pro-drop languages, the detection of explicit subjects, zero subjects and nonreferential impersonal constructions is crucial for anaphora and co-reference resolution. While the identiﬁcation of explicit and zero subjects has attracted the attention of researchers in the past, the automatic identiﬁcation of impersonal constructions in Spanish has not been addressed yet and this work is the ﬁrst such study.
We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes the system usable as a preprocessing ﬁlter for a coreference resolution system. We also report results of an annotation study dealing with the classiﬁcation of it by naive subjects.
Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable number of errors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction.
This paper proposes an automatic method of detecting grammar elements that decrease readability in a Japanese sentence. The method consists of two components: (1) the check list of the grammar elements that should be detected; and (2) the detector, which is a search program of the grammar elements from a sentence. By deﬁning a readability level for every grammar element, we can ﬁnd which part of the sentence is difﬁcult to read.
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments
Most of previous approaches to automatic prosodic event detection are based on supervised learning, relying on the availability of a corpus that is annotated with the prosodic labels of interest in order to train the classiﬁcation models. However, creating such resources is an expensive and time-consuming task. In this paper, we exploit semi-supervised learning with the co-training algorithm for automatic detection of coarse level representation of prosodic events such as pitch accents, intonational phrase boundaries, and break indices. ...
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: A radial basis classifier for the automatic detection of aspiration in children with dysphagia
We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. For the task of detecting decision regions, an HGM classiﬁer was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0.80 from 0.55.
Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism detection.
Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in positional annotation (e.g., partof-speech) and continuous structural annotation (e.g., syntactic constituency), no approach has yet been developed for automatically detecting annotation errors in discontinuous structural annotation.
Semantic relations between text concepts denote the core elements of lexical semantics. This paper presents a model for the automatic detection of INTENTION semantic relation. Our approach ﬁrst identiﬁes the syntactic patterns that encode intentions, then we select syntactic and semantic features for a SVM learning classiﬁer. In conclusion, we discuss the application of INTENTION relations to Q&A.
We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics. INTRODUCTION Spontaneous spoken language often includes speech that is not intended by the speaker to be part of the content of the utterance. ...
We apply topic modelling to automatically induce word senses of a target word, and demonstrate that our word sense induction method can be used to automatically detect words with emergent novel senses, as well as token occurrences of those senses. We start by exploring the utility of standard topic models for word sense induction (WSI), with a pre-determined number of topics (=senses). We next demonstrate that a non-parametric formulation that learns an appropriate number of senses per word actually performs better at the WSI task. ...
In information retrieval, genre classification could enable users to sort search results according to their immediate interests. People who go into a bookstore or library are not usually looking simply for information about a particular topic, but rather have requirements of genre as well: they are looking for scholarly articles about hypnotism, novels about the French Revolution, editorials about the supercollider, and so forth.
We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.
(BQ) Part 2 book "Dermoscopy image analysis" presents the following contents: Dermoscopy image assessment based on perceptible color regions, accurate and scalable system for automatic detection of malignant melanoma, from dermoscopy to mobile teledermatology,...
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors.