Research over the past decade has demonstrated the feasibility of extracting topographic information
of hydrological interest directly from digital elevation models (DEM). Techniques are
available for extracting slope properties, catchment areas, drainage divides, channel networks and
other data (Jenson and Domingue, 1988; Mark, 1988; Moore et al., 1991; Martz and Garbrecht,
1992). These techniques are faster and provide more precise and reproducible measurements than
traditional manual techniques applied to topographic maps (Tribe, 1991)....
From before the time Raven stole the sun and shed light on the world below,
the Gitxaal /
a people have lived in their territories along the north coast of
British Columbia. Gitxaal /
a laws (Ayaawk) and history (Adaawk) describe in
precise detail the relationships of trust, honor, and respect that are appro-
priate for the well-being and continuance of the people and, as important-
ly, define the rights of ownership over land, sea, and resources within the
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping.
Lexicon definition is one of the main bottlenecks in the development of new applications in the field of Information Extraction from text. Generic resources (e.g., lexical databases) are promising for reducing the cost of specific lexica definition, but they introduce lexical ambiguity. This paper proposes a methodology for building application-specific lexica by using WordNet. Lexical ambiguity is kept under control by marking synsets in WordNet with field labels taken from the Dewey Decimal Classification. tion requirement.
Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without signiﬁcant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision.
An important and well-studied problem is the production of semantic lexicons from a large corpus. In this paper, we present a system named ASIA (Automatic Set Instance Acquirer), which takes in the name of a semantic class as input (e.g., “car makers”) and automatically outputs its instances (e.g., “ford”, “nissan”, “toyota”). ASIA is based on recent advances in webbased set expansion - the problem of ﬁnding all instances of a set given a small number of “seed” instances.
Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. In this paper, we propose a method that uses both these resources to get an optimal compromise of precision and coverage.
In this demo we will present GATE, an architecture and framework for language engineering, and ANNIE, an information extraction system developed within it. We will demonstrate how ANNIE has been adapted to perform NE recognition in different languages, including Indic and Slavonic languages as well as Western European ones, and how the resources can be reused for new applications and languages.
This paper presents a tool for extracting multi-word expressions from corpora in Modern Greek, which is used together with a parallel concordancer to augment the lexicon of a rule-based machinetranslation system. The tool is part of a larger extraction system that relies, in turn, on a multilingual parser developed over the past decade in our laboratory. The paper reviews the various NLP modules and resources which enable the retrieval of Greek multi-word expressions and their translations: the Greek parser, its lexical database, the extraction and concordancing system. ...
In this paper, we compare three different generalization methods for in-domain and cross-domain opinion holder extraction being simple unsupervised word clustering, an induction method inspired by distant supervision and the usage of lexical resources. The generalization methods are incorporated into diverse classiﬁers. We show that generalization causes signiﬁcant improvements and that the impact of improvement depends on the type of classiﬁer and on how much training and test data differ from each other. ...
We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side.
The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data.
Negative life events play an important role in triggering depressive episodes. Developing psychiatric services that can automatically identify such events is beneficial for mental health care and prevention. Before these services can be provided, some meaningful semantic patterns, such as , have to be extracted. In this work, we present a text mining framework capable of inducing variable-length semantic patterns from unannotated psychiatry web resources.
At least two kinds of relations exist among related words: taxonomical relations and thematic relations. Both relations identify related words useful to language understanding and generation, information retrieval, and so on. However, although words with taxonomical relations are easy to identify from linguistic resources such as dictionaries and thesauri, words with thematic relations are difficult to identify because they are rarely maintained in linguistic resources.
We demonstrate TextRank – a system for unsupervised extractive summarization that relies on the application of iterative graphbased ranking algorithms to graphs encoding the cohesive structure of a text. An important characteristic of the system is that it does not rely on any language-speciﬁc knowledge resources or any manually constructed training data, and thus it is highly portable to new languages or domains.
We present an approach using syntactosemantic rules for the extraction of relational information from biomedical abstracts. The results show that by overcoming the hurdle of technical terminology, high precision results can be achieved. From abstracts related to baker’s yeast, we manage to extract a regulatory network comprised of 441 pairwise relations from 58,664 abstracts with an accuracy of 83–90%. To achieve this, we made use of a resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. ...
database maintained by the National Library of Medicine1 (NLM), which incorporates around 40,000 Health Sciences papers each month. Researchers depend on these electronic resources to keep abreast of their rapidly changing field. In order to maintain and update vital indexing references such as the Unified Medical Language System (UMLS) resources, the MeSH and SPECIALIST vocabularies, the NLM staff needs to review 400,000 highly-technical papers each year.
In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG categorybased subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs.
Foreign Direct Investment (FDI) – investment by foreign companies in overseas subsidiaries or
joint ventures – has a traditional reliance on natural resource use and extraction,particularly
agriculture, mineral and fuel production. Though this balance has shifted in recent years, the poorest countries still receive a disproportionate amount of investment flows into their natural resource sectors.
The rapid economic growth in Vietnam has resulted in an increasing demand for
electricity. This in turn translates to a higher rate of coal resource extraction and
consequent rise in pollution of water and land resources.
This study estimated the environmental costs associated with the electricity
demand requirements of the coal electricity sector, as a component of the long-run
marginal opportunity cost (LR-MOC) of electricity production.