This paper discusses research on distinguishing word meanings in the context of information retrieval systems. We conducted experiments with three sources of evidence for making these distinctions: morphology, part-of-speech, and phrases. We have focused on the distinction between h o m o n y m y and polysemy (unrelated vs. related meanings). Our results support the need to distinguish h o m o n y m y and p o l y semy. We found: 1) grouping morphological variants makes a significant improvement in retrieval performance, 2) that more than half of all words in a dictionary that differ...
[ Team LiB ] Recipe 6.9 Retrieving Constraints from a SQL Server Database Problem You need to programmatically define constraints in a DataSet and retrieve constraint information defined in a SQL Server database. Solution Use the INFORMATION_SCHEMA views and SQL Server system tables to get information about primary keys, foreign keys, and check constraints.
[ Tam LiB ] Recipe 10.3 Retrieving Column Default Values from SQL Server Problem The DataColumn object exposes a Default property. While the FillSchema( ) method of the DataAdapter returns schema information, it does not include the default values for columns.
[ Team LiB ] Recipe 9.10 Retrieving a Single Value from a Query Problem Given a stored procedure that returns a single value, you need the fastest way to get this data. Solution Use the ExecuteScalar( )
Information retrieval (IR) and figurative language processing (FLP) could scarcely be more different in their treatment of language and meaning. IR views language as an open-ended set of mostly stable signs with which texts can be indexed and retrieved, focusing more on a text’s potential relevance than its potential meaning. In contrast, FLP views language as a system of unstable signs that can be used to talk about the world in creative new ways.
The use of phrases in retrieval models has been proven to be helpful in the literature, but no particular research addresses the problem of discriminating phrases that are likely to degrade the retrieval performance from the ones that do not. In this paper, we present a retrieval framework that utilizes both words and phrases ﬂexibly, followed by a general learning-to-rank method for learning the potential contribution of a phrase in retrieval.
This paper explores the role of information retrieval in answering “relationship” questions, a new class complex information needs formally introduced in TREC 2005. Since information retrieval is often an integral component of many question answering strategies, it is important to understand the impact of different termbased techniques.
The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their linguistics-based knowledge.
A layered approach to information retrieval permits the inclusion of multiple search engines as well as multiple databases, with a natural language layer to convert English queries for use by the various search engines. The NLP layer incorporates morphological analysis, noun phrase syntax, and semantic expansion based on WordNet.
This paper addresses the problem of automatically retrieving answers for how-to questions, focusing on those that inquire about the procedure for achieving a specific goal. For such questions, typical information retrieval methods, based on key word matching, are better suited to detecting the content of the goal (e.g., ‘installing a Windows XP server’) than the general nature of the desired information (i.e., procedural, a series of steps for achieving this goal).
With the availability of large treebanks, retrieval techniques for highly structured data now become essential. In this contribution, we investigate the efficient retrieval of MT structures at the cost of a complex index--the Treegram Index. We illustrate our approach with the VENONA retrieval system, which handles the BH t (Biblia Hebraica transeripta) treebank comprising 508,650 phrase structure trees with maximum degree eight and maximum height 17, containing altogether 3.3 million Old-Hebrew words.
Astronomers are the oldest data collectors. The first catalogue of stars is due to
Hipparchus, in the second century B.C. Since that time, and more precisely since the end
of the last century, there has been an important increase in astronomical data. EHie to the
development of space astronomy during recent decades, we have witnessed a veritable
Confronted with this flood of data, astronomers have to change their methodology.
It is necessary not only to manage large databases, but also to take into account recent
developments in information retrieval....
Most sentiment analysis approaches use as baseline a support vector machines (SVM) classiﬁer with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classiﬁcation accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide signiﬁcant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing.
Community-based question answer (Q&A) has become an important issue due to the popularity of Q&A archives on the web. This paper is concerned with the problem of question retrieval. Question retrieval in Q&A archives aims to ﬁnd historical questions that are semantically equivalent or relevant to the queried questions. In this paper, we propose a novel phrase-based translation model for question retrieval.
The Automatic Content Linking Device is a just-in-time document retrieval system which monitors an ongoing conversation or a monologue and enriches it with potentially related documents, including multimedia ones, from local repositories or from the Internet. The documents are found using keyword-based search or using a semantic similarity measure between documents and the words obtained from automatic speech recognition.
Statistical language modeling (SLM) has been used in many different domains for decades and has also been applied to information retrieval (IR) recently. Documents retrieved using this approach are ranked according their probability of generating the given query. In this paper, we present a novel approach that employs the generalized Expectation Maximization (EM) algorithm to improve language models by representing their parameters as observation probabilities of Hidden Markov Models (HMM).
In this paper, we introduce a multilingual access and retrieval system with enhanced query translation and multilingual document retrieval, by mining bilingual terminologies and aligned document directly from the set of comparable corpora which are to be searched upon by users. By extracting bilingual terminologies and aligning bilingual documents with similar content prior to the search process provide more accurate translated terms for the in-domain data and support multilingual retrieval even without the use of translation tool during retrieval time....