Interactive Data Visualization for the Web makes these skills available at an introductory level for designers and visual artists without programming experience, journalists interested in the emerging data journalism processes, and others keenly interested in visualization and publicly available data sources.
Get a practical introduction to data visualization, accessible for beginners
This is not one of your “Learn HTML in 24 Hours” books, nor is it one of the
many introductory books on web graphics. It won’t teach you how to imitate
the stylistic tricks of famous web designers, turn ugly typography into
ugly 3-D typography, or build online shopping carts by bouncing databases
from one cryptic programming environment to another. This is a book for
working designers who seek to understand the Web as a medium and learn
how they can move to a career in web design. It’s also suited to designers
who wish to add web design to their repertoire of client services....
Mining bilingual data (including bilingual sentences and terms1) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval. In this paper, based on the observation that bilingual data in many web pages appear collectively following similar patterns, an adaptive pattern-based bilingual data mining method is proposed.
Esfinge is a general domain Portuguese question answering system. It tries to take advantage of the great amount of information existent in the World Wide Web. Since Portuguese is one of the most used languages in the web and the web itself is a constantly growing source of updated information, this kind of techniques are quite interesting and promising.
We propose an automatic method of extracting paraphrases from deﬁnition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are deﬁned in Web documents, and that the sentences that deﬁne the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that deﬁne the same concept as parallel corpora. ...
This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a speciﬁed concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. ...
We apply pattern-based methods for collecting hypernym relations from the web. We compare our approach with hypernym extraction from morphological clues and from large text corpora. We show that the abundance of available data on the web enables obtaining good results with relatively unsophisticated techniques.
We propose a method to generate large-scale encyclopedic knowledge, which is valuable for much NLP research, based on the Web. We ﬁrst search the Web for pages containing a term in question. Then we use linguistic patterns and HTML structures to extract text fragments describing the term. Finally, we organize extracted term descriptions based on word senses and domains. In addition, we apply an automatically generated encyclopedia to a question answering system targeting the Japanese InformationTechnology Engineers Examination. ...
As the arm of NLP technologies extends beyond a small core of languages, techniques for working with instances of language data across hundreds to thousands of languages may require revisiting and recalibrating the tried and true methods that are used. Of the NLP techniques that has been treated as “solved” is language identiﬁcation (language ID) of written text. However, we argue that language ID is far from solved when one considers input spanning not dozens of languages, but rather hundreds to thousands, a number that one approaches when harvesting language data found on the Web.
Until very recently, most NLP tasks (e.g., parsing, tagging, etc.) have been conﬁned to a very limited number of languages, the so-called majority languages. Now, as the ﬁeld moves into the era of developing tools for Resource Poor Languages (RPLs)—a vast majority of the world’s 7,000 languages are resource poor—the discipline is confronted not only with the algorithmic challenges of limited data, but also the sheer difﬁculty of locating data in the ﬁrst place.
In this paper, we present a new method for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small
set of terms and translations to obtain mixed-code snippets from a search engine, and automatically annotating the snippets with tags and features for training a conditional random field model.
We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text.
We present a novel approach to weakly supervised semantic class learning from the web, using a single powerful hyponym pattern combined with graph structures, which capture two properties associated with pattern-based extractions: popularity and productivity. Intuitively, a candidate is popular if it was discovered many times by other instances in the hyponym pattern. A candidate is productive if it frequently leads to the discovery of other instances.
This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervision in terms of data labeling. The learning process refines the PSM and constructs a transliteration lexicon at the same time. ...
We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents.
This paper presents an approach for the automatic acquisition of qualia structures for nouns from the Web and thus opens the possibility to explore the impact of qualia structures for natural language processing at a larger scale. The approach builds on earlier work based on the idea of matching speciﬁc lexico-syntactic patterns conveying a certain semantic relation on the World Wide Web using standard search engines. In our approach, the qualia elements are actually ranked for each qualia role with respect to some measure. ...
We argue for the need for systems that output fewer terms, but with a higher precision. Moreover, all the above were conducted on language pairs including English. It would be possible, albeit more difficult, to obtain comparable corpora for pairs such as French-Japanese. We will try to remove the need to gather corpora beforehand altogether. To achieve this, we use the web as our only source of data. This idea is not new, and has already been tried by Cao and Li (2002) for base noun phrase translation. ...