Xem 1-20 trên 175 kết quả Wikipedia
  • Nếu bạn sở hữu 1 chiếc iPod và tất nhiên là dung lượng trong máy vẫn còn trống. Hãy thử cài đặt Wikipedia - Cuốn bách khoa toàn thư của thế giới lên chiếc máy của mình. Sở hữu cuốn bách khoa toàn thư Wikipedia sẽ giúp bạn thuận tiện tra cứu nhanh thông tin trong iPod chỉ với 1 dung lượng 1,7 GB.

    pdf4p chocopielaorion 31-03-2011 45 5   Download

  • Want to be part of the largest group-writing project in human history? Learn how to contribute to Wikipedia, the user-generated online reference for the 21st century. Considered more popular than eBay,, and, Wikipedia generates approximately 30,000 requests per second, or about 2.5 billion per day. It's become the first point of reference for people the world over who need a fact fast. If you want to jump on board and add to the content, Wikipedia: The Missing Manual is your first-class ticket.

    pdf498p stingdau_123 21-01-2013 21 4   Download

  • Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia.

    pdf9p hongphan_1 14-04-2013 11 3   Download

  • Disambiguating concepts and entities in a context sensitive way is a fundamental problem in natural language processing. The comprehensiveness of Wikipedia has made the online encyclopedia an increasingly popular target for disambiguation. Disambiguation to Wikipedia is similar to a traditional Word Sense Disambiguation task, but distinct in that the Wikipedia link structure provides additional information about which disambiguations are compatible.

    pdf10p hongdo_1 12-04-2013 12 2   Download

  • Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism detection.

    pdf6p hongdo_1 12-04-2013 19 2   Download

  • In this paper we examine the task of sentence simplification which aims to reduce the reading complexity of a sentence by incorporating more accessible vocabulary and sentence structure. We introduce a new data set that pairs English Wikipedia with Simple English Wikipedia and is orders of magnitude larger than any previously examined for sentence simplification.

    pdf5p hongdo_1 12-04-2013 23 2   Download

  • This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms. We present extraction methods geared to cover the broad range of the lexical reference relation and analyze them extensively. Most extraction methods yield high precision levels, and our rule-base is shown to perform better than other automatically constructed baselines in a couple of lexical expansion and matching tasks. Our rule-base yields comparable performance to WordNet while providing largely complementary information. ...

    pdf9p hongphan_1 14-04-2013 15 2   Download

  • This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. ...

    pdf9p hongphan_1 14-04-2013 19 2   Download

  • We describe an unsupervised approach to multi-document sentence-extraction based summarization for the task of producing biographies. We utilize Wikipedia to automatically construct a corpus of biographical sentences and TDT4 to construct a corpus of non-biographical sentences. We build a biographical-sentence classifier from these corpora and an SVM regression model for sentence ordering from the Wikipedia corpus. We evaluate our work on the DUC2004 evaluation data and with human judges.

    pdf9p hongphan_1 15-04-2013 14 2   Download

  • The API computes semantic relatedness by: 1. taking a pair of words as input; 2. retrieving the Wikipedia articles they refer to (via a disambiguation strategy based on the link structure of the articles); 3. computing paths in the Wikipedia categorization graph between the categories the articles are assigned to; 4. returning as output the set of paths found, scored according to some measure definition. The implementation includes path-length (Rada et al., 1989; Wu & Palmer, 1994; Leacock & Chodorow, 1998), information-content (Resnik, 1995; Seco et al.

    pdf4p hongvang_1 16-04-2013 21 2   Download

  • We consider the problem of NER in Arabic Wikipedia, a semisupervised domain adaptation setting for which we have no labeled training data in the target domain. To facilitate evaluation, we obtain annotations for articles in four topical groups, allowing annotators to identify domain-specific entity types in addition to standard categories. Standard supervised learning on newswire text leads to poor target-domain recall.

    pdf12p bunthai_1 06-05-2013 21 2   Download

  • We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated.

    pdf10p bunthai_1 06-05-2013 22 2   Download

  • In this paper, we propose an annotation schema for the discourse analysis of Wikipedia Talk pages aimed at the coordination efforts for article improvement. We apply the annotation schema to a corpus of 100 Talk pages from the Simple English Wikipedia and make the resulting dataset freely available for download1 . Furthermore, we perform automatic dialog act classification on Wikipedia discussions and achieve an average F1 -score of 0.82 with our classification pipeline.

    pdf10p bunthai_1 06-05-2013 21 2   Download

  • In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences.

    pdf9p nghetay_1 07-04-2013 16 1   Download

  • Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations.

    pdf10p hongdo_1 12-04-2013 13 1   Download

  • Is it possible to use sense inventories to improve Web search results diversity for one word queries? To answer this question, we focus on two broad-coverage lexical resources of a different nature: WordNet, as a de-facto standard used in Word Sense Disambiguation experiments; and Wikipedia, as a large coverage, updated encyclopaedic resource which may have a better coverage of relevant senses in Web pages.

    pdf10p hongdo_1 12-04-2013 13 1   Download

  • We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. ...

    pdf6p hongdo_1 12-04-2013 18 1   Download

  • A well-recognized limitation of research on supervised sentence compression is the dearth of available training data. We propose a new and bountiful resource for such training data, which we obtain by mining the revision history of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, we have collected a training corpus of over 380,000 sentence pairs, two orders of magnitude larger than the standardly used Ziff-Davis corpus.

    pdf4p hongphan_1 15-04-2013 13 1   Download

  • This paper describes the online demo of the QuALiM Question Answering system. While the system actually gets answers from the web by querying major search engines, during presentation answers are supplemented with relevant passages from Wikipedia. We believe that this additional information improves a user’s search experience.

    pdf4p hongphan_1 15-04-2013 18 1   Download

  • We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.

    pdf4p hongphan_1 15-04-2013 39 1   Download


Đồng bộ tài khoản