Information retrieval

  • This paper discusses research on distinguishing word meanings in the context of information retrieval systems. We conducted experiments with three sources of evidence for making these distinctions: morphology, part-of-speech, and phrases. We have focused on the distinction between h o m o n y m y and polysemy (unrelated vs. related meanings). Our results support the need to distinguish h o m o n y m y and p o l y semy. We found: 1) grouping morphological variants makes a significant improvement in retrieval performance, 2) that more than half of all words in a dictionary that differ...

  • Information retrieval (IR) and figurative language processing (FLP) could scarcely be more different in their treatment of language and meaning. IR views language as an open-ended set of mostly stable signs with which texts can be indexed and retrieved, focusing more on a text’s potential relevance than its potential meaning. In contrast, FLP views language as a system of unstable signs that can be used to talk about the world in creative new ways.

  • This paper explores the role of information retrieval in answering “relationship” questions, a new class complex information needs formally introduced in TREC 2005. Since information retrieval is often an integral component of many question answering strategies, it is important to understand the impact of different termbased techniques.

  • The present paper will seek to present an approach to bilingual lexicon extraction from non-aligned comparable corpora, phrasal translation as well as evaluations on Cross-Language Information Retrieval. A two-stages translation model is proposed for the acquisition of bilingual terminology from comparable corpora, disambiguation and selection of best translation alternatives according to their linguistics-based knowledge.

  • Astronomers are the oldest data collectors. The first catalogue of stars is due to Hipparchus, in the second century B.C. Since that time, and more precisely since the end of the last century, there has been an important increase in astronomical data. EHie to the development of space astronomy during recent decades, we have witnessed a veritable inflation. Confronted with this flood of data, astronomers have to change their methodology. It is necessary not only to manage large databases, but also to take into account recent developments in information retrieval....

  • Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing.

  • Statistical language modeling (SLM) has been used in many different domains for decades and has also been applied to information retrieval (IR) recently. Documents retrieved using this approach are ranked according their probability of generating the given query. In this paper, we present a novel approach that employs the generalized Expectation Maximization (EM) algorithm to improve language models by representing their parameters as observation probabilities of Hidden Markov Models (HMM).

  • This paper explores the relationship between the translation quality and the retrieval effectiveness in Machine Translation (MT) based Cross-Language Information Retrieval (CLIR). To obtain MT systems of different translation quality, we degrade a rule-based MT system by decreasing the size of the rule base and the size of the dictionary. We use the degraded MT systems to translate queries and submit the translated queries of varying quality to the IR system. Retrieval effectiveness is found to correlate highly with the translation quality of the queries.

  • We discuss a seml-interactive approach to information retrieval which consists of two tasks performed in a sequence. First, the system assists the searcher in building a comprehensive statement of information need, using automatically generated topical summaries of sample documents.

  • Previous comparisons of document and query translation suffered difficulty due to differing quality of machine translation in these two opposite directions. We avoid this difficulty by training identical statistical translation models for both translation directions using the same training data. We investigate information retrieval between English and French, incorporating both translations directions into both document translation and query translation-based information retrieval, as well as into hybrid systems. ...

  • This paper deals with translation ambiguity and target polysemy problems together. Two monolingual balanced corpora are employed to learn word co-occurrence for translation ambiguity resolution, and augmented translation restrictions for target polysemy resolution. Experiments show that the model achieves 62.92% of monolingual information retrieval, and is 40.80% addition to the select-all model. Combining the target polysemy resolution, the retrieval performance is about 10.11% increase to the model resolving translation ambiguity only. ...

  • We developed a prototype information retrieval system which uses advanced natural language processing techniques to enhance the effectiveness of traditional key-word based document retrieval. The backbone of our system is a statistical retrieval engine which performs automated indexing of documents, then search and ranking in response to user queries. This core architecture is augmented with advanced natural language processing tools which are both robust and efficient.

  • This paper proposes a method to overcome the drawbacks of WordNet when applied to information retrieval by complementing it with Roget's thesaurus and corpus-derived thesauri. Words and relations which are not included in WordNet can be found in the corpus-derived thesauri. Effects of polysemy can be minimized with weighting m e t h o d considering all query terms and all of the thesauri. Experimental results show that our method enhances information retrieval performance significantly.

  • This chapter covers decision support, including online analytical processing and data mining and information retrieval. In this chapter introduce data analysis and mining with the following contents: Relevance ranking using terms; relevance using hyperlinks; synonyms, homonyms, and ontologies; indexing of documents; measuring retrieval effectiveness; web search engines; information retrieval and structured data; directories.

  • Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations.

  • We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of the blocks, on the basis that open class parts of speech are more content-bearing than closed class parts of speech.

  • We report on the development of a new automatic feedback model to improve information retrieval in digital libraries. Our hypothesis is that some particular sentences, selected based on argumentative criteria, can be more useful than others to perform well-known feedback information retrieval tasks.

  • Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis techniques to create better indexing phrases for information retrieval.

  • The representation of whole texts is a major concern of the field known as information retrieval (IR), an importaunt aspect of which might more precisely be called 'document retrieval' (DR). The DR situation, with which we will be concerned, is, in general, the following: a. A user, recognizing an information need, presents to an IR mechanism (i.e., a collection of texts, with a set of associated activities for representing, storing, matching, etc.) a request, based upon that need hoping that the mechanism will be able to satisfy that need. ...

  • The paper presents the state of elaboration of the natural language information retrieval system DIALOG. Its aim is an automatic, conversational extraction of facts from a given text. Actually it is real medical text on gastroenterology, which was prepared by a team of specialists. The system has a modular structure. The first, and in fact very important module is the language analysis module. Its task is to ensure the transition of a medical text from its natural form, i.e. rentences formed by physicians, into a formal ~ogical notation. ...

