Xem 1-20 trên 36 kết quả Using wikipedia
  • Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. ...

    pdf10p hongdo_1 12-04-2013 10 3   Download

  • We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classifier from these corpora and an SVM regression model for sentence ordering from the Wikipedia corpus. We evaluate our work on the DUC2004 evaluation data and with human judges.

    pdf9p hongphan_1 15-04-2013 11 2   Download

  • We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.

    pdf4p hongphan_1 15-04-2013 35 1   Download

  • Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive crosscorpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on crosscorpus evaluation by up to 11%. ...

    pdf9p bunthai_1 06-05-2013 15 1   Download

  • In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet. We use the high-level structure of human-authored texts to automatically induce a domainspecific template for the topic structure of a new overview. The algorithmic innovation of our work is a method to learn topicspecific extractors for content selection jointly for the entire template.

    pdf9p hongphan_1 14-04-2013 25 3   Download

  • This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. ...

    pdf9p hongphan_1 14-04-2013 18 2   Download

  • We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated.

    pdf10p bunthai_1 06-05-2013 20 2   Download

  • In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences.

    pdf9p nghetay_1 07-04-2013 15 1   Download

  • Is it possible to use sense inventories to improve Web search results diversity for one word queries? To answer this question, we focus on two broad-coverage lexical resources of a different nature: WordNet, as a de-facto standard used in Word Sense Disambiguation experiments; and Wikipedia, as a large coverage, updated encyclopaedic resource which may have a better coverage of relevant senses in Web pages.

    pdf10p hongdo_1 12-04-2013 12 1   Download

  • A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus.

    pdf4p hongphan_1 15-04-2013 12 1   Download

  • A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding translations of terms in comparable corpora without using any linguistic resources. Experiments on a document-aligned English-Italian Wikipedia corpus confirm that the developed methods which only use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space.

    pdf6p hongdo_1 12-04-2013 18 4   Download

  • In this paper, we extend distant supervision (DS) based on Wikipedia for Relation Extraction (RE) by considering (i) relations defined in external repositories, e.g. YAGO, and (ii) any subset of Wikipedia documents. We show that training data constituted by sentences containing pairs of named entities in target relations is enough to produce reliable supervision.

    pdf6p hongdo_1 12-04-2013 9 2   Download

  • The objectives for this project were essentially straightforward: design an interactive web-based application (as opposed to the creation and modification of a website) that facilitates user-initiated graphic design historical content within the structure of a timeline. The central theme for the web application is the timeline – and this is where the Graphic Design History Interactive Timeline (GDHit) differs from Wikipedia, TDE.org, or other user-generated databases.

    pdf73p hoangphiyeah1tv 24-04-2013 26 2   Download

  • Due to Arabic’s morphological complexity, Arabic retrieval benefits greatly from morphological analysis – particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links.

    pdf5p nghetay_1 07-04-2013 7 1   Download

  • This paper presents a novel approach to automatic captioning of geo-tagged images by summarizing multiple webdocuments that contain information related to an image’s location. The summarizer is biased by dependency pattern models towards sentences which contain features typically provided for different scene types such as those of churches, bridges, etc. Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries. ...

    pdf9p hongdo_1 12-04-2013 15 1   Download

  • From Wikipedia, the free encyclopedia Từ Wikipedia tiếng Việt Jump to: navigation , search Bước tới: chuyển hướng , tìm kiếm This article is about general aspects of water. Bài này viết về các khía cạnh chung của các nước. For a detailed discussion of its properties, see Properties of water . Đối với một cuộc thảo luận chi tiết các thuộc tính của nó, xem thuộc tính của nước . For other uses, see Water (disambiguation) . Để sử dụng khác, xem nước (định hướng) ......

    doc37p nguyenthuanlam 15-06-2011 188 41   Download

  • An applied field of biology that involves the use of living organisms and bioprocesses in engineering, technology, medicine and other fields requiring bioproducts. (http://en.wikipedia.org/wiki/Biotechnology) It is the products and exploitation from other fundamental sciences.Traditional Biotechnology Traditional biotechnology refers to a number of ancient ways of using living organisms to make new products or modify existing ones. In its broadest definition, traditional biotechnology can be traced back to human's transition from hunter-gatherer to farmer.

    ppt29p zingzing09 24-04-2013 24 6   Download

  • JYTHON 315 Even though web2py runs with Jython out of the box, there is some trickery involved in setting up Jython and in setting up zxJDBC (the Jython database adaptor). Here are the instructions: • Download the file "jython installer-2.5.0.jar" (or 2.5.x) from Jython.org • Install it: 1 java -jar jython_installer-2.5.0.jar • Download and install "zxJDBC.jar" from http://sourceforge.net/projects/zxjdbc/ • Download and install the file "sqlitejdbc-v056.jar" from http://www.zentus.

    pdf10p yukogaru14 30-11-2010 39 4   Download

  • Doctorow, Cory Published: 2010 Categorie(s): Fiction, Science Fiction, Short Stories Source: http://craphound.com/walh/e-book/browse-all-versions 1 .About Doctorow: Cory Doctorow (born July 17, 1971) is a blogger, journalist and science fiction author who serves as co-editor of the blog Boing Boing. He is in favor of liberalizing copyright laws, and a proponent of the Creative Commons organisation, and uses some of their licenses for his books. Some common themes of his work include digital rights management, file sharing, Disney, and post-scarcity economics.

    pdf282p hotmoingay7 23-01-2013 24 4   Download

  • Legibility and focus upon the content was a primary design consideration, as it is hoped the timeline will grow through a wealth of contributions. Early research to incorporate imagery into the timeline revealed the hurdle of copyrighted material on the World Wide Web, and the entanglement of Academic Fair Use, images in the public domain, partially copyrighted material, and 100% copyrighted material.

    pdf294p hoangphiyeah1tv 24-04-2013 13 4   Download

Đồng bộ tài khoản