Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors, including the Part of Speech tags in improving the grammaticality of the output. However, high quality part of speech taggers are not available in open domain for many languages.
We present a ﬁrst known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classiﬁcation. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora.
A term-list is a list of content words that characterize a consistent text or a concept. This paper presents a new method for translating a term-list by using a corpus in the target language. The method first retrieves alternative translations for each input word from a bilingual dictionary. It then determines the most 'coherent' combination of alternative translations, where the coherence of a set of words is defined as the proximity among multi-dimensional vectors produced from the words on the basis of co-occurrence statistics. ...
We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1% precision. We also show how the results can be used in the compilation of domain-specific noun phrases. ...
In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modiﬁcation of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model.
This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance.
It exploits a vector-space model developed in information retrieval research. We present a preliminary result from our computational experiment. Introduction Many machine translation systems have been developed and commercialized. When these systems are faced with unknown domains, however, their performance degrades. Although there are several reasons behind this poor performance, in this paper, we concentrate on one of the major problems, i.e., building a bilingual dictionary for transfer.
Term translation probabilities proved an effective method of semantic smoothing in the language modelling approach to information retrieval tasks. In this paper, we use Generalized Latent Semantic Analysis to compute semantically motivated term and document vectors. The normalized cosine similarity between the term vectors is used as term translation probability in the language modelling framework. Our experiments demonstrate that GLSAbased term translation probabilities capture semantic relations between terms and improve performance on document classiﬁcation. ...
As science has become more interdisciplinary and impinges ever more heavily on
technology, we have been led to the conclusion that there is a great need now for a
textbook that emphasizes the physical and chemical origins of the properties of solids
while at the same time focusing on the technologically important materials that are
being developed and used by scientists and engineers.
An invariant object recognition system needs to be able to recognise the object under
any usual a priori defined distortions such as translation, scaling and in‐plane and outof‐
plane rotation. Ideally, the system should be able to recognise (detect and classify)
any complex scene of objects even within background clutter noise. This problem is a
very complex and difficult one. In this book, we present recent advances towards
achieving fully‐robust object recognition.
Tuyển tập các báo cáo nghiên cứu về y học được đăng trên tạp chí y học quốc tế cung cấp cho các bạn kiến thức về ngành y đề tài:Coordinate enhancement of transgene transcription and translation in a lentiviral vector
where is a block, is its predecessor block, and eft ight eutral is a threevalued orientation component linked to the block : a block is generated to the left or the right of , where the orientation its predecessor block of the predecessor block is ignored. Here, is the number of blocks in the translation. We are interested in learning the weight vector from the training data.