Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors.
Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper, we propose a generative probabilistic model, called entitymention model, which can leverage heterogenous entity knowledge (including popularity knowledge, name knowledge and context knowledge) for the entity linking task. ...
In this paper we give an overview of the Knowledge Base Population (KBP) track at the 2010 Text Analysis Conference. The main goal of KBP is to promote research in discovering facts about entities and augmenting a knowledge base (KB) with these facts. This is done through two tasks, Entity Linking – linking names in context to entities in the KB – and Slot Filling – adding information about an entity to the KB.
This paper describes a spoken dialog QA system as a substitution for call centers. The system is capable of making dialogs for both ﬁxing speech recognition errors and for clarifying vague questions, based on only large text knowledge base. We introduce two measures to make dialogs for ﬁxing recognition errors. An experimental evaluation shows the advantages of these measures.
We have developed an approach to natural language processing in which the natural language processor is viewed as a knowledge-based system whose knowledge is about the meanings of the utterances of its language. The approach is orzented around the phrase rather than the word as the basic unit. We believe that this p a r a d i ~ for language processing not only extends the capabilities of other natural language systems, but handles those tasks that previous systems could perform in e more systematic and extensible manner. ...
We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.
Spoken Language Understanding (SLU) addresses the problem of extracting semantic meaning conveyed in an utterance. The traditional knowledge-based approach to this problem is very expensive -- it requires joint expertise in natural language processing and speech recognition, and best practices in language engineering for every new domain. On the other hand, a statistical learning approach needs a large amount of annotated data for model training, which is seldom available in practical applications outside of large research labs. ...
As the first step in an automated text summarization algorithm, this work presents a new method for automatically identifying the central ideas in a text based on a knowledge-based concept counting paradigm. To represent and generalize concepts, we use the hierarchical concept taxonomy WordNet. By setting appropriate cutoff values for such parameters as concept generality and child-to-parent frequency ratio, we control the amount and level of generality of concepts extracted from the text.
Knowledge-Based Report Generation is a technique for automatically generating natural language reports from computer databases. It is so named because it applies knowledge-based expert systems software to the problem of text generation. The first application of the technique, a system for generating natural language stock reports from a daily stock quotes database, is partially implemented.
A lexical knowledge base is a repository of computational information about concepts intended to be generally useful in many application areas including computational linguistics, artificial intelligence, and information science. It contains information derived from machine-readable dictionaries, the full text of reference books, the results of statistical analyses of text usages, and data manually obtained from human world knowledge.
AnswerBus News Engine' is a question answering system using the contents of CNN Web site 2 as its knowledge base. Comparing to other question answering systems including its previous versions, it has a totally independent crawling and indexing system and a fully functioning search engine. Because of its dynamic and continuous indexing, it is possible to answer questions on just-happened facts. Again, it reaches high correct answer rate. In this demonstration we will present the living system as well as its new technical features. ...
This demo abstract describes the SmartWeb Ontology-based Annotation system (SOBA). A key feature of SOBA is that all information is extracted and stored with respect to the SmartWeb Integrated Ontology (SWIntO). In this way, other components of the systems, which use the same ontology, can access this information in a straightforward way. We will show how information extracted by SOBA is visualized within its original context, thus enhancing the browsing experience of the end user.
Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. ...
We propose a method to generate large-scale encyclopedic knowledge, which is valuable for much NLP research, based on the Web. We ﬁrst search the Web for pages containing a term in question. Then we use linguistic patterns and HTML structures to extract text fragments describing the term. Finally, we organize extracted term descriptions based on word senses and domains. In addition, we apply an automatically generated encyclopedia to a question answering system targeting the Japanese InformationTechnology Engineers Examination. ...
A sophisticated natural language system requires a large knowledge base. A methodology is described for constructing one in a principled way. Facts are selected for the knowledge base by determining what facts are linguistically presupposed by a text in the domain of interest. The facts are sorted into clnsters, and within each cluster they are organized according to their logical dependencies. Finally, the facts are encoded as predicate calculus axioms.
This pape~ describes an "intelligent" tutor of foreign language concepts and skills based upon state-of-the-art research in Intelligent reaching Systems and Computational Linguistics. The tutor is part of a large R&D project in ITS which resulted in a system (called DART~ for the design and development of intelligent teaching dialogues on PLATO I and in a program (called ELISA~ for teaching foreign language conjunctions in context. ELISA was able to teach a few conjunctions in English, Dutch and Italian.
With globalisation and knowledge-based production, firms may cooperate on a global scale, outsource
parts of their administrative or productive units and negate location altogether. The extremely low transaction
costs of data, information and knowledge seem to invalidate the theory of agglomeration and the spatial clustering
of firms, going back to the classical work by Alfred Weber (1868-1958) and Alfred Marshall (1842-1924), who
emphasized the microeconomic benefits of industrial collocation.
Vietnam is embarking on a path towards a knowledge-based economy in which the emergence of
knowledge clusters in Ho Chi Minh City and the Mekong Delta are playing a decisive role. As our paper
suggests, clustering appears to have a positive effect not only on the increase of knowledge output, but
also on the economic growth of these regions. Using a GIS-based mapping method, we can identify two
major knowledge clusters – Ho Chi Minh City and Can Tho City.
We describe novel aspects of a new natural language generator called Nitrogen. This generator has a highly flexible input representation that allows a spectrum of input from syntactic to semantic depth, and shifts' the burden of many linguistic decisions to the statistical post-processor. The generation algorithm is compositional, making it efficient, yet it also handles non-compositional aspects of language. Nitrogen's design makes it robust and scalable, operating with lexicons and knowledge bases of one hundred thousand entities. ...
A tool is described which helps in the creation, extension and updating of lexical knowledge bases (LKBs). Two levels of representation are distinguished: a static storage level and a dynamic knowledge level. The latter is an object-oriented environment containing linguistic and lexicographic knowledge. At the knowledge level, constructors and filters can be defined. Constructors are objects which extend the LKB both horizontally (new information) and vertically (new entries) using the linguistic knowledge.