Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more speciﬁc instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for converting ﬂat sets of instance-level annotations into hierarchically organized, concept-level annotations, which capture not only the broad semantics of the desired arguments (e.g., ‘People’ rather than ‘Locations’), but also the correct level of generality (e.g.
Data-driven approaches in computational semantics are not common because there are only few semantically annotated resources available. We are building a large corpus of public-domain English texts and annotate them semi-automatically with syntactic structures (derivations in Combinatory Categorial Grammar) and semantic representations (Discourse Representation Structures), including events, thematic roles, named entities, anaphora, scope, and rhetorical structure. We have created a wiki-like Web-based platform on which a crowd of expert annotators (i.e.
“Lightweight” semantic annotation of text calls for a simple representation, ideally without requiring a semantic lexicon to achieve good coverage in the language and domain. In this paper, we repurpose WordNet’s supersense tags for annotation, developing speciﬁc guidelines for nominal expressions and applying them to Arabic Wikipedia articles in four topical domains.
We describe the ongoing construction of a large, semantically annotated corpus resource as reliable basis for the largescale acquisition of word-semantic information, e.g. the construction of domainindependent lexica. The backbone of the annotation are semantic roles in the frame semantics paradigm. We report experiences and evaluate the annotated data from the ﬁrst project stage. On this basis, we discuss the problems of vagueness and ambiguity in semantic annotation.
We present a technique for automatic induction of slot annotations for subcategorization frames, based on induction of hidden classes in the EM framework of statistical estimation. The models are empirically evalutated by a general decision test. Induction of slot labeling for subcategorization frames is accomplished by a further application of EM, and applied experimentally on frame observations derived from parsing large corpora. We outline an interpretation of the learned representations as theoretical-linguistic decompositional lexical entries. ...
The paper describes a cross-lingual document retrieval system in the medical domain that employs a controlled vocabulary (UMLS I ) in constructing an XMLbased intermediary representation into which queries as well as documents are mapped. The system assists in the retrieval of English and German medical scientific abstracts relevant to a German query document (electronic patient record). The modularity of the system allows for deployment in other domains, given appropriate linguistic and semantic resources. ...
am delighted to see a book on multimedia semantics covering metadata, analysis, and
interaction edited by three very active researchers in the field: Troncy, Huet, and Schenk.
This is one of those projects that are very difficult to complete because the field is
advancing rapidly in many different dimensions. At any time, you feel that many important
emerging areas may not be covered well unless you see the next important conference in
the field. A state of the art book remains a moving, often elusive, target. But this is only a
part of the dilemma.
Broad-coverage semantic annotations for training statistical learners are only available for a handful of languages. Previous approaches to cross-lingual transfer of semantic annotations have addressed this problem with encouraging results on a small scale. In this paper, we scale up previous efforts by using an automatic approach to semantic annotation that does not rely on a semantic ontology for the target language.
This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and ontological identiﬁers by applying a deep parser and a term recognizer. During the run time, user requests are converted into queries of region algebra on these annotations. Structural matching with pre-computed semantic annotations establishes the accurate and efﬁcient retrieval of relational concepts. This framework was applied to a text retrieval system for MEDLINE. ...
There is little consensus on a standard experimental design for the compound interpretation task. This paper introduces wellmotivated general desiderata for semantic annotation schemes, and describes such a scheme for in-context compound annotation accompanied by detailed publicly available guidelines. Classiﬁcation experiments on an open-text dataset compare favourably with previously reported results and provide a solid baseline for future research.
The problem we address in this paper is that of providing contextual examples of translation equivalents for words from the general lexicon using comparable corpora and semantic annotation that is uniform for the source and target languages. For a sentence, phrase or a query expression in the source language the tool detects the semantic type of the situation in question and gives examples of similar contexts from the target language corpus.
In data-oriented language processing, an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new sentence is constructed by combining fragments from the corpus in the most probable way. This approach has been successfully used for syntactic analysis, using corpora with syntactic annotations such as the Penn Tree-bank. If a corpus with semantically annotated sentences is used, the same approach can also generate the most probable semantic interpretation of an input sentence. The present paper explains this semantic interpretation method. ...
Supervised semantic role labeling (SRL) systems trained on hand-crafted annotated corpora have recently achieved state-of-the-art performance. However, creating such corpora is tedious and costly, with the resulting corpora not sufficiently representative of the language. This paper describes a part of an ongoing work on applying bootstrapping methods to SRL to deal with this problem. Previous work shows that, due to the complexity of SRL, this task is not straight forward.
We describe a new approach to disambiguating semantic frames evoked by lexical predicates previously unseen in a lexicon or annotated data. Our approach makes use of large amounts of unlabeled data in a graph-based semi-supervised learning framework. We construct a large graph where vertices correspond to potential predicates and use label propagation to learn possible semantic frames for new ones.
The task of Semantic Role Labeling (SRL) is often divided into two sub-tasks: verb argument identiﬁcation, and argument classiﬁcation. Current SRL algorithms show lower results on the identiﬁcation sub-task. Moreover, most SRL algorithms are supervised, relying on large amounts of manually created data. In this paper we present an unsupervised algorithm for identifying verb arguments, where the only type of annotation required is POS tagging.
We have developed a novel, publicly available annotation tool for the semantic encoding of texts, especially those in the narrative domain. Users can create formal propositions to represent spans of text, as well as temporal relations and other aspects of narrative. A built-in naturallanguage generation component regenerates text from the formal structures, which eases the annotation process. We have run collection experiments with the tool and shown that non-experts can easily create semantic encodings of short fables. ...
In this paper, we introduce a corpus of consumer reviews from the rateitall and the eopinions websites annotated with opinion-related information. We present a two-level annotation scheme. In the ﬁrst stage, the reviews are analyzed at the sentence level for (i) relevancy to a given topic, and (ii) expressing an evaluation about the topic.
Negation is present in all human languages and it is used to reverse the polarity of part of statements that are otherwise afﬁrmative by default. A negated statement often carries positive implicit meaning, but to pinpoint the positive part from the negative part is rather difﬁcult. This paper aims at thoroughly representing the semantics of negation by revealing implicit positive meaning. The proposed representation relies on focus of negation detection. For this, new annotation over PropBank and a learning algorithm are proposed. ...
Compositional question answering begins by mapping questions to logical forms, but training a semantic parser to perform this mapping typically requires the costly annotation of the target logical forms. In this paper, we learn to map questions to answers via latent logical forms, which are induced automatically from question-answer pairs. In tackling this challenging learning problem, we introduce a new semantic representation which highlights a parallel between dependency syntax and efﬁcient evaluation of logical forms. ...
Current approaches for semantic parsing take a supervised approach requiring a considerable amount of training data which is expensive and difﬁcult to obtain. This supervision bottleneck is one of the major difﬁculties in scaling up semantic parsing. We argue that a semantic parser can be trained effectively without annotated data, and introduce an unsupervised learning algorithm. The algorithm takes a self training approach driven by conﬁdence estimation.