Xem 1-20 trên 474 kết quả Data on the web
  • Until very recently, most NLP tasks (e.g., parsing, tagging, etc.) have been confined to a very limited number of languages, the so-called majority languages. Now, as the field moves into the era of developing tools for Resource Poor Languages (RPLs)—a vast majority of the world’s 7,000 languages are resource poor—the discipline is confronted not only with the algorithmic challenges of limited data, but also the sheer difficulty of locating data in the first place.

    pdf4p bunthai_1 06-05-2013 14 2   Download

  • As the arm of NLP technologies extends beyond a small core of languages, techniques for working with instances of language data across hundreds to thousands of languages may require revisiting and recalibrating the tried and true methods that are used. Of the NLP techniques that has been treated as “solved” is language identification (language ID) of written text. However, we argue that language ID is far from solved when one considers input spanning not dozens of languages, but rather hundreds to thousands, a number that one approaches when harvesting language data found on the Web.

    pdf9p bunthai_1 06-05-2013 18 2   Download

  • Mining bilingual data (including bilingual sentences and terms1) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval. In this paper, based on the observation that bilingual data in many web pages appear collectively following similar patterns, an adaptive pattern-based bilingual data mining method is proposed.

    pdf9p hongphan_1 14-04-2013 14 3   Download

  • Tuyển tập các báo cáo nghiên cứu về y học được đăng trên tạp chí y học Critical Care giúp cho các bạn có thêm kiến thức về ngành y học đề tài: Publishing Chinese medicine knowledge as Linked Data on the Web...

    pdf12p thulanh18 28-10-2011 17 2   Download

  • We apply pattern-based methods for collecting hypernym relations from the web. We compare our approach with hypernym extraction from morphological clues and from large text corpora. We show that the abundance of available data on the web enables obtaining good results with relatively unsophisticated techniques.

    pdf4p hongvang_1 16-04-2013 18 2   Download

  • Hi! Thanks for picking up my book. I sincerely hope that it finds its way to a convenient spot on your desk. Nothing would warm my heart more than to see a beat-down, dogeared, coffee-stained copy of this book right next to your computer. On the other hand, it would drive me nuts if you bought this book only to discover that it didn’t address your needs. In the spirit of customer satisfaction, please read the following introduction to get a sense of where I’m coming from, and whether you might get some good use out of this book....

    pdf263p tailieuvip14 26-07-2012 22 7   Download

  • This paper presents a new web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM), a web page is represented as a DOM tree. Then a DOM tree alignment model is proposed to identify the translationally equivalent texts and hyperlinks between two parallel DOM trees. By tracing the identified parallel hyperlinks, parallel web documents are recursively mined. Compared with previous mining schemes, the benchmarks show that this new mining scheme improves the mining coverage, reduces mining bandwidth, and enhances the quality of mined parallel sentences.

    pdf8p hongvang_1 16-04-2013 20 3   Download

  • XSQL isn't just some razzle-dazzle technology. It allows you to easily leverage the most robust, mature, and usable technologies in the industry: SQL, HTML, HTTP, XML, Java, and the Oracle RDBMS. With an exciting first look at XSQL, this innovative book shows you how to bring all of these powerful technologies together in order to publish dynamic Web content. You'll first find a comprehensive discussion of how XSQL relates to each of these technologies. Then you'll learn how you can use XSQL to present your database data on the Web instantly. The numerous code examples will show you how to...

    pdf593p suthebeo 17-07-2012 52 14   Download

  • The existence of different autonomous Web sites containing related information has given rise to the problem of integrating these sources effectively to provide a comprehensive integrated source of relevant information. The advent of e-commerce and the increasing trend of availability of commercial data on the Web has generated the need to analyze and manipulate these data to support corporate decision making. Decision support systems now must be able to harness and analyze Web data to provide organizations with a competitive edge.

    pdf488p hotmoingay3 09-01-2013 17 4   Download

  • The best practice today is to read data into the SAS environment for processing. For highly repeatable processes, this might not be efficient because it takes time to transfer the data and resources are used to temporarily store in the SAS environment. In some cases, the results of the SAS processing must be transferred back to the DBMS for final storage, which further increases the cost. Addressing this challenge can result in improved resource utilization and enable companies to answer business questions more quickly. ...

    pdf10p yasuyidol 02-04-2013 18 4   Download

  • We argue for the need for systems that output fewer terms, but with a higher precision. Moreover, all the above were conducted on language pairs including English. It would be possible, albeit more difficult, to obtain comparable corpora for pairs such as French-Japanese. We will try to remove the need to gather corpora beforehand altogether. To achieve this, we use the web as our only source of data. This idea is not new, and has already been tried by Cao and Li (2002) for base noun phrase translation. ...

    pdf0p bunthai_1 06-05-2013 17 1   Download

  • Lichens have been used to study air pollution chemistry in national parks and forests since the 1980s (Figures 1 and 2). There have also been a few lichen studies on national wildlife refuges. Most of the studies have been floristic studies, reports of baseline concentrations of elements in lichen tissues and, occasionally, trends in these concentrations. Figure 1 shows park and refuge locations with tissue chemistry data. USGS Biological Resources Division maintains a web site listing lichens known from each of the national parks shown on the map (http://www.ies.wisc.

    pdf52p saimatkhauroi 01-02-2013 24 7   Download

  • In this book for designers, developers, and product managers, expert developer and user interface designer Lukas Mathis explains how to make usability the cornerstone of every point in your design process, walking you through the necessary steps to plan the design for an application or website, test it, and get usage data after the design is complete. He shows you how to focus your design process on the most important thing: helping people get things done, easily and efficiently.

    pdf315p caucaphung 04-02-2013 22 5   Download

  • The US government has cut back on its reporting over time, and its web pages now do little more that report on current events. Unlike the Iraq War, there is no Department of Defense quarterly report on the progress of the war, and efforts to create effective Afghan security, governance, and development. There is no equivalent to the State Department weekly status report. Testimony to Congress, while useful, does not provide detailed statements or back up slide with maps, graphs, and other data on the course of the war.

    pdf28p thamgiacongdong 02-05-2013 18 3   Download

  • Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpus by downloading web pages to create a topic-diverse collection of 10 billion words of English. We show that for context-sensitive spelling correction the Web Corpus results are better than using a search engine. For thesaurus extraction, it achieved similar overall results to a corpus of newspaper text.

    pdf8p bunthai_1 06-05-2013 24 3   Download

  • In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. With Beautiful Data, you will: Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web

    pdf384p stingdau_123 19-01-2013 16 2   Download

  • Requests for permission to reproduce or translate WHO publications – whether for sale or for noncommercial distribution – should be addressed to WHO Press through the WHO web site (http://www.who.int/about/ licensing/copyright_form/en/index.html). The designations employed and the presentation of the material in this publication do not imply the expression of any opinion whatsoever on the part of the World Health Organization concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.

    pdf92p bin_pham 05-02-2013 9 2   Download

  • This paper proposes to solve the bottleneck of finding training data for word sense disambiguation (WSD) in the domain of web queries, where a complete set of ambiguous word senses are unknown. In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only given. The novelty of our approach is to use “pseudo negative examples” with reliable confidence score estimated by a classifier trained with positive and unlabeled examples.

    pdf4p hongphan_1 15-04-2013 23 2   Download

  • Scattered throughout the tutorial there are a number of sections devoted more to explaining the basics of XML than to programming exercises. They are listed here so as to form an XML thread you can follow without covering the entire programming tutorial: Understanding XML and the Java XML APIs explains the basics of XML and gives you a guide to the acronyms associated with it. It also provides an overview of the JavaTM XML APIs you can use to manipulate XML-based data, including the Java API for XML Parsing ((JAXP).

    pdf494p doxuan 07-08-2009 383 176   Download

  • Our previous tutorial discussed the basics of XML and demonstrated its potential to revolutionize the Web. In this tutorial, we’ll discuss how to use an XML parser to: • Process an XML document • Create an XML document • Manipulate an XML document We’ll also talk about some useful, lesser-known features of XML parsers. Best of all, every tool discussed here is freely available at IBM’s alphaWorks site (www.alphaworks.ibm.com) and other places on the Web.

    pdf59p doxuan 07-08-2009 440 169   Download

Đồng bộ tài khoản