Xem 1-20 trên 128 kết quả Resource extraction
  • Research over the past decade has demonstrated the feasibility of extracting topographic information of hydrological interest directly from digital elevation models (DEM). Techniques are available for extracting slope properties, catchment areas, drainage divides, channel networks and other data (Jenson and Domingue, 1988; Mark, 1988; Moore et al., 1991; Martz and Garbrecht, 1992). These techniques are faster and provide more precise and reproducible measurements than traditional manual techniques applied to topographic maps (Tribe, 1991)....

    pdf256p 951628473 07-05-2012 42 21   Download

  • From before the time Raven stole the sun and shed light on the world below, the Gitxaal / a people have lived in their territories along the north coast of British Columbia. Gitxaal / a laws (Ayaawk) and history (Adaawk) describe in precise detail the relationships of trust, honor, and respect that are appro- priate for the well-being and continuance of the people and, as important- ly, define the rights of ownership over land, sea, and resources within the territory.

    pdf281p huetay_1 28-02-2013 33 6   Download

  • Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping.

    pdf6p hongdo_1 12-04-2013 27 4   Download

  • Lexicon definition is one of the main bottlenecks in the development of new applications in the field of Information Extraction from text. Generic resources (e.g., lexical databases) are promising for reducing the cost of specific lexica definition, but they introduce lexical ambiguity. This paper proposes a methodology for building application-specific lexica by using WordNet. Lexical ambiguity is kept under control by marking synsets in WordNet with field labels taken from the Dewey Decimal Classification. tion requirement.

    pdf4p bunthai_1 06-05-2013 16 3   Download

  • Although researchers have conducted extensive studies on relation extraction in the last decade, supervised approaches are still limited because they require large amounts of training data to achieve high performances. To build a relation extractor without significant annotation effort, we can exploit cross-lingual annotation projection, which leverages parallel corpora as external resources for supervision.

    pdf6p nghetay_1 07-04-2013 15 2   Download

  • An important and well-studied problem is the production of semantic lexicons from a large corpus. In this paper, we present a system named ASIA (Automatic Set Instance Acquirer), which takes in the name of a semantic class as input (e.g., “car makers”) and automatically outputs its instances (e.g., “ford”, “nissan”, “toyota”). ASIA is based on recent advances in webbased set expansion - the problem of finding all instances of a set given a small number of “seed” instances.

    pdf9p hongphan_1 14-04-2013 16 2   Download

  • Automatically acquiring synonymous collocation pairs such as and from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. Methods that use monolingual corpora alone or use bilingual corpora alone are apparently inadequate because of low precision or low coverage. In this paper, we propose a method that uses both these resources to get an optimal compromise of precision and coverage.

    pdf8p bunbo_1 17-04-2013 12 2   Download

  • In this demo we will present GATE, an architecture and framework for language engineering, and ANNIE, an information extraction system developed within it. We will demonstrate how ANNIE has been adapted to perform NE recognition in different languages, including Indic and Slavonic languages as well as Western European ones, and how the resources can be reused for new applications and languages.

    pdf4p bunthai_1 06-05-2013 22 2   Download

  • This paper presents a tool for extracting multi-word expressions from corpora in Modern Greek, which is used together with a parallel concordancer to augment the lexicon of a rule-based machinetranslation system. The tool is part of a larger extraction system that relies, in turn, on a multilingual parser developed over the past decade in our laboratory. The paper reviews the various NLP modules and resources which enable the retrieval of Greek multi-word expressions and their translations: the Greek parser, its lexical database, the extraction and concordancing system. ...

    pdf4p bunthai_1 06-05-2013 20 2   Download

  • In this paper, we compare three different generalization methods for in-domain and cross-domain opinion holder extraction being simple unsupervised word clustering, an induction method inspired by distant supervision and the usage of lexical resources. The generalization methods are incorporated into diverse classifiers. We show that generalization causes significant improvements and that the impact of improvement depends on the type of classifier and on how much training and test data differ from each other. ...

    pdf11p bunthai_1 06-05-2013 17 2   Download

  • We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side.

    pdf9p nghetay_1 07-04-2013 19 1   Download

  • The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data.

    pdf6p nghetay_1 07-04-2013 15 1   Download

  • Negative life events play an important role in triggering depressive episodes. Developing psychiatric services that can automatically identify such events is beneficial for mental health care and prevention. Before these services can be provided, some meaningful semantic patterns, such as , have to be extracted. In this work, we present a text mining framework capable of inducing variable-length semantic patterns from unannotated psychiatry web resources.

    pdf8p hongvang_1 16-04-2013 20 1   Download

  • At least two kinds of relations exist among related words: taxonomical relations and thematic relations. Both relations identify related words useful to language understanding and generation, information retrieval, and so on. However, although words with taxonomical relations are easy to identify from linguistic resources such as dictionaries and thesauri, words with thematic relations are difficult to identify because they are rarely maintained in linguistic resources.

    pdf4p hongvang_1 16-04-2013 15 1   Download

  • We demonstrate TextRank – a system for unsupervised extractive summarization that relies on the application of iterative graphbased ranking algorithms to graphs encoding the cohesive structure of a text. An important characteristic of the system is that it does not rely on any language-specific knowledge resources or any manually constructed training data, and thus it is highly portable to new languages or domains.

    pdf4p bunbo_1 17-04-2013 19 1   Download

  • We present an approach using syntactosemantic rules for the extraction of relational information from biomedical abstracts. The results show that by overcoming the hurdle of technical terminology, high precision results can be achieved. From abstracts related to baker’s yeast, we manage to extract a regulatory network comprised of 441 pairwise relations from 58,664 abstracts with an accuracy of 83–90%. To achieve this, we made use of a resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. ...

    pdf8p bunbo_1 17-04-2013 13 1   Download

  • database maintained by the National Library of Medicine1 (NLM), which incorporates around 40,000 Health Sciences papers each month. Researchers depend on these electronic resources to keep abreast of their rapidly changing field. In order to maintain and update vital indexing references such as the Unified Medical Language System (UMLS) resources, the MeSH and SPECIALIST vocabularies, the NLM staff needs to review 400,000 highly-technical papers each year.

    pdf8p bunbo_1 17-04-2013 14 1   Download

  • In this paper we present a methodology for extracting subcategorisation frames based on an automatic LFG f-structure annotation algorithm for the Penn-II Treebank. We extract abstract syntactic function-based subcategorisation frames (LFG semantic forms), traditional CFG categorybased subcategorisation frames as well as mixed function/category-based frames, with or without preposition information for obliques and particle information for particle verbs.

    pdf8p bunbo_1 17-04-2013 17 1   Download

  • Foreign Direct Investment (FDI) – investment by foreign companies in overseas subsidiaries or joint ventures – has a traditional reliance on natural resource use and extraction,particularly agriculture, mineral and fuel production. Though this balance has shifted in recent years, the poorest countries still receive a disproportionate amount of investment flows into their natural resource sectors.

    pdf100p truongdoan 10-11-2009 174 114   Download

  • The rapid economic growth in Vietnam has resulted in an increasing demand for electricity. This in turn translates to a higher rate of coal resource extraction and consequent rise in pollution of water and land resources. This study estimated the environmental costs associated with the electricity demand requirements of the coal electricity sector, as a component of the long-run marginal opportunity cost (LR-MOC) of electricity production.

    pdf0p hailedangbs 02-05-2013 65 23   Download

Đồng bộ tài khoản