Machine representation

  • In this paper we present an ambiguity preserving translation approach which transfers ambiguous LFG f-structure representations. It is based on packed f-structure representations which are the result of potentially ambiguous utterances. If the ambiguities between source and target language can be preserved, no unpacking during transfer is necessary and the generator may produce utterances which maximally cover the underlying ambiguities.

  • We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.

  • In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. I propose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. nce it’s u

  • This paper introduces a machine learning method based on bayesian networks which is applied to the mapping between deep semantic representations and lexical semantic resources. A probabilistic model comprising Minimal Recursion Semantics (MRS) structures and lexicalist oriented semantic features is acquired. Lexical semantic roles enriching the MRS structures are inferred, which are useful to improve the accuracy of deep semantic parsing.

  • A minimally supervised machine learning framework is described for extracting relations of various complexity. Bootstrapping starts from a small set of n-ary relation instances as “seeds”, in order to automatically learn pattern rules from parsed data, which then can extract new instances of the relation and its projections. We propose a novel rule representation enabling the composition of n-ary relation rules on top of the rules for projections of the relation.

  • there i s no d i f f e r e n c e between source and t a r g e t representation; in a transfer-based system, the step between the two i s usually c a l l e d t r a n s f e r , and t h i s step i s meant to be as simple as p o s s i b l e . The research described was o r i g i n a l l y done in the framework of the EUROTRA M p r o...

  • A new approach to structure-driven generation is I)resented that is based on a separate semantics as input structure. For the first time, a GPSGbased formalism is complemented with a system of pattern-action rules that relate the parts of a semantics to appropriate syntactic rules. This way a front end generator can be adapted to some application system (such as a machine translation system) more easily than would be possible with many previous generators based on modern grammar formalisms.

  • Introduction of Machine learning: Definitions of Machine learning, representation of the learning problem, application examples of ML, key elements of a ML problem, issues in Machine Learning, types of learning problems.

  • Necessity of Introducing Some Information Provided by Transformational Analysis into MT Algorithms Irena Bellert Department of English Philology, Warsaw University A few examples of ambiguous English constructions and their Polish equivalents are discussed in terms of the correlation between their respective phrase-marker representations and transformational analyses.

  • Statistical machine translation is quite robust when it comes to the choice of input representation. It only requires consistency between training and testing. As a result, there is a wide range of possible preprocessing choices for data used in statistical machine translation. This is even more so for morphologically rich languages such as Arabic. In this paper, we study the effect of different word-level preprocessing schemes for Arabic on the quality of phrase-based statistical machine translation. ...

  • Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed, a standard word n-gram back-off language model is used in most systems. In this work, we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. ...

  • The LOGON MT demonstrator assembles independently valuable general-purpose NLP components into a machine translation pipeline that capitalizes on output quality. The demonstrator embodies an interesting combination of hand-built, symbolic resources and stochastic processes. h1 , { h1 :proposition m(h3 ), h4 :proper q(x5 , h6 , h7 ), h8 :named(x5,‘Bodø’), h9 : populate v(e2 , , x5 ), h9 : densely r(e2 ) }, { h 3 =q h9 , h6 =q h8 } Figure 1: Simplified MRS representation for the utterance ‘Bodø is densely populated.

  • This set of candidate surface strings, represented as a word lattice, is then rescored by a wordbigram language model, to produce the bestranked output sentence. FERGUS (Bangalore and Rambow, 2000), on the other hand, employs a model of syntactic structure during sentence realization. In simple terms, it adds a tree-based stochastic model to the approach taken by the Nitrogen system. This tree-based model chooses a best-ranked XTAG representation for a given dependency structure.

  • This paper discusses how a two-level knowledge representation model for machine translation integrates aspectual information with lexical-semantic information by means of parameterization. The integration of aspect with lexical-semantics is especially critical in machine translation because of the lexical selection and aspectual realization processes that operate during the production of the target-language sentence: there are often a large number of lexical and aspectual possibilities to choose from in the production of a sentence from a lexical semantic representation. ...

  • Machine translation (MT) has recently been formulated in terms of constraint-based knowledge representation and unification theories~ but it is becoming more and more evident that it is not possible to design a practical M T system without an adequate method of handling mismatches between semantic representations in the source and target languages. In this paper, we introduce the idea of "information-based" MT, which is considerably more flexible than interlingual MT or the conventional transfer-based MT.

  • We present a language model consisting of a collection of costed bidirectional finite state automata associated with the head words of phrases. The model is suitable for incremental application of lexical associations in a dynamic programming search for optimal dependency tree derivations. We also present a model and algorithm for machine translation involving optimal "tiling" of a dependency tree with entries of a costed bilingual lexicon. Experimental results are reported comparing methods for assigning cost functions to these models.

  • This paper describes the design of a prototype machine translation system for a sublanguage of job advertis~nents. The design is based on the hypothesis that specialized linguistic subsystems may require special crmputational treatment and that therefore a relatively shallow analysis of the text may be sufficient for automatic translation of the sublanguage. This hypothesis and the desire to minimize computation in the transfer phase has led to the adoption of a flat tree representation of the linguistic data. ...

  • It has also become one of the most visible representations of natural language processing to the outside world. Machine translation systems are relatively unique with respect to the extent of the coverage they attempt, and, correspondingly, the size of the grammatical and lexicaI corpora involved. Adding to this the complexity introduced by multiple language directions into the same system design (and the enormous procedural problems imposed by simultaneous development in several sites) gives some clue as to the optimism which presently exists for machine translation. ...

  • We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes: morphological tagging, analytical and tectogrammatical parsing of Czech, tectogrammatical transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ, and a simple rule-based system for generation from English tectogrammatical representation.

  • Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, handcrafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. Spoken language (speech-only) understanding systems have addressed this issue of lack of robustness of hand-crafted grammars by exploiting classification techniques to extract fillers of a frame representation.

