To accomplish the sad duty of paying homage to Alain Glavieux, I have referred
to his biography as much as my own memories. Two points of this biography struck
me, although I had hardly paid attention to them until now. I first noted that Alain
Glavieux, born in 1949, is the exact contemporary of information theory, since it
was based on the articles of Shannon in 1948 and 1949. I also noted that his first
research at the Ecole Nationale Supérieure de Télécommunications de Bretagne
(ENST Brittany) related to underwater acoustic communications. ...
This paper presents a detailed study of the integration of knowledge from both dependency parses and hierarchical word ontologies into a maximum-entropy-based tagging model that simultaneously labels words with both syntax and semantics. Our ﬁndings show that information from both these sources can lead to strong improvements in overall system accuracy: dependency knowledge improved performance over all classes of word, and knowledge of the position of a word in an ontological hierarchy increased accuracy for words not seen in the training data. ...
Automatic segmentation is important for making multimedia archives comprehensible, and for developing downstream information retrieval and extraction modules. In this study, we explore approaches that can segment multiparty conversational speech by integrating various knowledge sources (e.g., words, audio and video recordings, speaker intention and context). In particular, we evaluate the performance of a Maximum Entropy approach, and examine the effectiveness of multimodal features on the task of dialogue segmentation. ...
In this paper we compare two approaches to natural language understanding (NLU). The first approach is derived from the field of statistical machine translation (MT), whereas the other uses the maximum entropy (ME) framework. Starting with an annotated corpus, we describe the problem of NLU as a translation from a source sentence to a formal language target sentence. We mainly focus on the quality of the different alignment and ME models and show that the direct ME approach outperforms the alignment templates method. ...
This book is devoted to the theory of probabilistic information measures and
their application to coding theorems for information sources and noisy chan-
nels. The eventual goal is a general development of Shannon's mathematical
theory of communication, but much of the space is devoted to the tools and
methods required to prove the Shannon coding theorems. These tools form an
area common to ergodic theory and information theory and comprise several
quantitative notions of the information in random variables, random processes,
and dynamical systems....
A discrete source generates three independent symbols A, B and C with probabilities 0.9, 0.08 and 0.02 respectively. a) Determine the entropy of the source. b) Determine the redundancy of the source. 3.2 a) Consider a source
We present a framework for statistical machine translation of natural languages based on direct maximum entropy models, which contains the widely used source-channel approach as a special case. All knowledge sources are treated as feature functions, which depend on the source language sentence, the target language sentence and possible hidden variables. This approach allows a baseline machine translation system to be extended easily by adding new feature functions. We show that a baseline statistical machine translation system is signiﬁcantly improved using this approach. ...
This paper extends previous work on extracting parallel sentence pairs from comparable data (Munteanu and Marcu, 2005). For a given source sentence S, a maximum entropy (ME) classiﬁer is applied to a large set of candidate target translations . A beam-search algorithm is used to abandon target sentences as non-parallel early on during classiﬁcation if they fall outside the beam. This way, our novel algorithm avoids any document-level preﬁltering step.
There is a continuing need to use recent and consistent multisectoral economic data to support
policy analysis and the development of economywide models. Updating and estimating inputoutput
tables and social accounting matrices (SAMs), which provides the underlying data
framework for this type of model and analysis, for a recent year is a difficult and a challenging
problem. Typically, input-output data are collected at long intervals (usually five years or more),
while national income and product data are available annually, but with a lag.