his monograph presents methods for full comparative distributional analysis based on the relative distribution. This provides a general integrated framework for analysis, a graphical component that simplifies exploratory data analysis and display, a statistically valid basis for the development of hypothesis-driven summary measures, and the potential for decomposition - enabling the examination of complex hypotheses regarding the origins of distributional changes within and between groups.
A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for ﬁnding translations of terms in comparable corpora without using any linguistic resources. Experiments on a document-aligned English-Italian Wikipedia corpus conﬁrm that the developed methods which only use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space.
Distributional similarity has been widely used to capture the semantic relatedness of words in many NLP tasks. However, various parameters such as similarity measures must be handtuned to make it work effectively. Instead, we propose a novel approach to synonym identiﬁcation based on supervised learning and distributional features, which correspond to the commonality of individual context types shared by word pairs. Considering the integration with pattern-based features, we have built and compared ﬁve synonym classiﬁers. ...
Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. ...
Iterative bootstrapping algorithms are typically compared using a single set of handpicked seeds. However, we demonstrate that performance varies greatly depending on these seeds, and favourable seeds for one algorithm can perform very poorly with others, making comparisons unreliable. We exploit this wide variation with bagging, sampling from automatically extracted seeds to reduce semantic drift. However, semantic drift still occurs in later iterations.
In this paper we investigate ChineseEnglish name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics — and therefore share references to named entities — but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results.
There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple languages and compare our method with a monolingual syntax-based method.
In this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. We present a novel precisionoriented algorithm that relies on per-topic word distributions obtained by the bilingual LDA (BiLDA) latent topic model. The algorithm aims at harvesting only the most probable word translations across languages in a greedy fashion, without any prior knowledge about the language pair, relying on a symmetrization process and the one-to-one constraint.
This paper discusses the results of a comparative study of distributional equivalences among adjectivals in four Slavic languages, namely, Russian, Czech, Polish and Serbo-Croatian. A procedure for determining equivalence is defined, and is applied to the results of analyzing the adjectivals of each language with respect to gender, animateness, and case and number.
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the na¨ve nearestı neighbour approach to comparing context vectors extracted from large corpora scales poorly (O(n2 ) in the vocabulary size). In this paper, we compare several existing approaches to approximating the nearestneighbour search for distributional similarity. We investigate the trade-off between efﬁciency and accuracy, and ﬁnd that SASH (Houle and Sakuma, 2005) provides the best balance. ...
A distributional method for part-of-speech induction is presented which, in contrast to most previous work, determines the part-of-speech distribution of syntactically ambiguous words without explicitly tagging the underlying text corpus. This is achieved by assuming that the word pair consisting of the left and right neighbor of a particular token is characteristic of the part of speech at this position, and by clustering the neighbor pairs on the basis of their middle words as observed in a large corpus.
This paper quantitatively investigates in how far local context is useful to disambiguate the senses of an ambiguous word. This is done by comparing the co-occurrence frequencies of particular context words. First, one context word representing a certain sense is chosen, and then the co-occurrence frequencies with two other context words, one of the same and one of another sense, are compared. As expected, it turns out that context words belonging to the same sense have considerably higher co-occurrence frequencies than words belonging to different senses. ...
In this paper, we present a feature-based method to align documents with similar content across two sets of bilingual comparable corpora from daily news texts. We evaluate the contribution of each individual feature and investigate the incorporation of these diverse statistical and heuristic features for the task of bilingual document alignment. Experimental results on the English-Chinese and EnglishMalay comparable news corpora show that our proposed Discrete Fourier Transformbased term frequency distribution feature is very effective. ...
In this chapter, the learning objectives are: Determine the tax consequences to the buyer and seller of the disposition of a partnership interest, including the amount and character of gain or loss recognized; list the reasons for distributions, and compare operating and liquidating distributions; determine the tax consequences of proportionate operating distributions;…
This chapter examines the various issues related to the process of moving a product from one country to another, beginning by comparing and contrasting the major transportation modes. The discussion then focuses on insurance and packing for export.
My intention in this textbook is to provide a self-contained exposition of the fundamentals
and applications of statistical thermodynamics for beginning graduate students in the engineering
sciences. Especially within engineering, most students enter a course in statistical
thermodynamics with limited exposure to statistics, quantum mechanics, and spectroscopy.
Hence, I have found it necessary over the years to “start from the beginning,” not leaving
out intermediary steps and presuming little knowledge in the discrete, as compared to
the continuum, domain of physics.
Finally, the last part of the questionnaire aimed to acquire information on the most innovative
examples of financing mechanisms used in the EU countries. The goal for this was to provide
innovative examples of financing mechanisms, which would be used for detailed analysis and
material for a multi-criteria analysis (MCA) of alternative financing mechanisms. In total, 35 cases of
financing mechanisms were reported from 13 countries.
Latin America is often singled out for its high and persistent income inequality. Toward the end of the 1990s, however, income concentration began to fall across the region. Of the seventeen countries for which comparable data are available, twelve have experienced a decline, particularly since 2000. This book is among the first efforts to understand what happened in these countries and why.
The topic ‘elite’ may be dealt with either in a few lines, or in many pages.
There is no half way. In fact, it encompasses issues which are crucial to
the social sciences, such as the relation between the distribution of wealth,
prestige and power; the exercise of power and the composition of the group
that holds it. The list is extensive.