In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difﬁcult to identify. We propose a novel algorithmic approach to RE that starts by ﬁrst identifying these structures and then, within these, identifying the semantic type of the relation.
Machine learning approaches have been developed to address relation extraction, which is the task of extracting semantic relations between entities expressed in text. Supervised approaches are limited in scalability because labeled data is expensive to produce. A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009).
Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without signiﬁcant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision.
We present a simple semi-supervised relation extraction system with large-scale word clustering. We focus on systematically exploring the effectiveness of different cluster-based features. We also propose several statistical methods for selecting clusters at an appropriate level of granularity. When training on different sizes of data, our semi-supervised approach consistently outperformed a state-of-the-art supervised baseline system.
In this paper, we extend distant supervision (DS) based on Wikipedia for Relation Extraction (RE) by considering (i) relations deﬁned in external repositories, e.g. YAGO, and (ii) any subset of Wikipedia documents. We show that training data constituted by sentences containing pairs of named entities in target relations is enough to produce reliable supervision.
Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACEstyle algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision.
Creating labeled training data for relation extraction is expensive. In this paper, we study relation extraction in a special weakly-supervised setting when we have only a few seed instances of the target relation type we want to extract but we also have a large amount of labeled instances of other relation types. Observing that different relation types can share certain common structures, we propose to use a multi-task learning method coupled with human guidance to address this weakly-supervised relation extraction problem. ...
This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a speciﬁed concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. ...
Shortage of manually labeled data is an obstacle to supervised relation extraction methods. In this paper we investigate a graph based semi-supervised learning algorithm, a label propagation (LP) algorithm, for relation extraction. It represents labeled and unlabeled examples and their distances as the nodes and the weights of edges of a graph, and tries to obtain a labeling function to satisfy two constraints: 1) it should be ﬁxed on the labeled nodes, 2) it should be smooth on the whole graph. ...
Kernel based methods dominate the current trend for various relation extraction tasks including protein-protein interaction (PPI) extraction. PPI information is critical in understanding biological processes. Despite considerable efforts, previously reported PPI extraction results show that none of the approaches already known in the literature is consistently better than other approaches when evaluated on different benchmark PPI corpora.
Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually ﬂuents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. ...
Relation extraction is the task of finding semantic relations between two entities from text. In this paper, we propose a novel feature-based Chinese relation extraction approach that explicitly defines and explores nine positional structures between two entities. We also suggest some correction and inference mechanisms based on relation hierarchy and co-reference information etc. The approach is effective when evaluated on the ACE 2005 Chinese data set.
The automatic extraction of relations between entities expressed in natural language text is an important problem for IR and text understanding. In this paper we show how different kernels for parse trees can be combined to improve the relation extraction quality. On a public benchmark dataset the combination of a kernel for phrase grammar parse trees and for dependency parse trees outperforms all known tree kernel approaches alone suggesting that both types of trees contain complementary information for relation extraction. ...
This paper proposes a novel hierarchical learning strategy to deal with the data sparseness problem in relation extraction by modeling the commonality among related classes. For each class in the hierarchy either manually predefined or automatically clustered, a linear discriminative function is determined in a topdown way using a perceptron algorithm with the lower-level weight vector derived from the upper-level weight vector.
Most information extraction systems either use hand written extraction patterns or use a machine learning algorithm that is trained on a manually annotated corpus. Both of these approaches require massive human effort and hence prevent information extraction from becoming more widely applicable. In this paper we present URES (Unsupervised Relation Extraction System), which extracts relations from the Web in a totally unsupervised way.
Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract relations that contain attributes of unknown types. However, this generality comes with the cost in accuracy.
Extracting semantic relationships between entities is challenging. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using SVM. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while additional information from full parsing gives limited further enhancement.
The annotated mentions in the Corpus are single or multi-word expressions which refer to a particular real world or abstract entity. The mentions are annotated to indicate sets of mentions which constitute co-reference groups referring to the same entity. Five relationships are annotated between these entities: PartOf, FeatureOf, Produces, InstanceOf, and MemberOf. One signiﬁcant difference between these relation annotations and those in the ACE Corpus is that the former are relations between sets of mentions (the co-reference groups) rather than between individual mentions....
A complex relation is any n-ary relation in which some of the arguments may be be unspeciﬁed. We present here a simple two-stage method for extracting complex relations between named entities in text. The ﬁrst stage creates a graph from pairs of entities that are likely to be related, and the second stage scores maximal cliques in that graph as potential complex relation instances. We evaluate the new method against a standard baseline for extracting genomic variation relations from biomedical text. ing named entities.
Taking this route sets up a dual goal: (a) from the generic paraphrasing perspective - an objective evaluation of paraphrase acquisition performance on a concrete application dataset, as well as identifying the additional mechanisms needed to match paraphrases in texts; (b) from the RE perspective investigating the feasibility and performance of a generic paraphrase-based approach for RE. Our conﬁguration assumes a set of entailing templates (non-symmetric “paraphrases”) for the target relation.