intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Báo cáo khoa học: "A Framework for Syntactic Translation"

Chia sẻ: Nghetay_1 Nghetay_1 | Ngày: | Loại File: PDF | Số trang:7

45
lượt xem
3
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Adequate mechanical translation can be based only on adequate structural descriptions of the languages involved and on an adequate statement of equivalences. Translation is conceived of as a three-step process: recognition of the structure of the incoming text in terms of a structural specifier; transfer of this specifier into a structural specifier in the other language; and construction to order of the output text specified.

Chủ đề:
Lưu

Nội dung Text: Báo cáo khoa học: "A Framework for Syntactic Translation"

  1. [Mechanical Translation, vol.4, no.3, December 1957; pp. 59-65] A Framework for Syntactic Translation † V. H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts Adequate mechanical translation can be based only on adequate structural descrip- tions of the languages involved and on an adequate statement of equivalences. Translation is conceived of as a three-step process: recognition of the structure of the incoming text in terms of a structural specifier; transfer of this specifier into a structural specifier in the other language; and construction to order of the output text specified. c ase of German-to-English word-for-word Introduction t ranslations. Over the years it has become THE CURRENT M.I.T. approach to mechani- increasingly clear that most, if not all, of the cal translation is aimed at providing routines problems associated with word-for-word trans- intrinsically capable of producing correct and lation can be solved by the proper manipulation accurate translation. We are attempting to go or utilization of the context. Context is to be beyond simple word-for-word translation; be- understood here in its broadest interpretation. yond translation using empirical, ad hoc, or C ontextual clues were treated in detail in an pragmatic syntactic routines. The concept of earlier article.4 T he six types of clues dis- full syntactic translation has emerged: trans- cussed there will be reformulated briefly here. lation based on a thorough understanding of lin- They are: guistic structures, their equivalences, and 1) The field of discourse. This was one of the meanings. earliest types of clues to be recognized. It can, The Problems b y the use of specialized dictionaries, assist in the selection of the proper meaning of words The difficulties associated with word-for- that carry different meanings in different fields w ord translation were appreciated from the of discourse. The field of discourse may be very beginning, at least in outline form. Warren Weaver1 and Erwin Reifler2 in early determined by the operator, who places the ap- propriate glossary in the machine; or it may memoranda called attention to the problems of multiple meaning, while Oswald and Fletcher3 be determined by a machine routine on the basis of the occurrences of certain text words that began by fixing their attention on the word- are diagnostic of the field. order problems — particularly glaring in the † This work was supported in part by the U. S. 2. Erwin Reifler, "Studies in Mechanical A rmy (Signal Corps), the U.S. Air Force Translation No. 1, MT, " mimeographed (Jan. ( Office of Scientific Research, Air Research 1950) and Development Command), and the U.S. Navy 3. Oswald and Fletcher, "Proposals for the ( Office of Naval Research); and in part by the Mechanical Resolution of German Syntax Pat- National Science Foundation. terns, " Modern Language Forum, vol. XXXVI, no. 2-4 (1951) 1. Warren Weaver, "Translation," Machine 4. V. H. Yngve, "Terminology in the Light of Translation of Languages, edited by Locke and Research on Mechanical Translation, " Babel, Booth (New York and London, 1955) vol. 2, no. 3 (Oct. 1956)
  2. 60 V. H. Yngve 2) Recognition of coherent word groups, such The other approach concentrates on trying to as idioms and compound nouns. This clue can obtain a complete understanding of each portion provide a basis for translating such word groups of the problem so that completely adequate rou- correctly even when their meaning does not fol- tines can be developed. low simply from the meanings of the separate At any stage in the development of mechanical words. t ranslation there will be some things that are perfectly understood and can therefore serve as 3) The syntactic function of each word. If the the basis for perfect translation. In the area of t ranslating program can determine syntactic verb, noun, and adjective inflection, it is pos- function, clues will be available for solving sible to do a "100 per cent job" because all the word-order problems as well as a large num- p aradigms are available and all of the excep- ber of difficult multiple-meaning problems. tions are known and have been listed. In this Clues of this type will help, for example, in area one need not be satisfied with anything determining whether der in German should be less than a perfect job. t ranslated as an article or as a relative or de- A t the same time there will be some things monstrative pronoun, and whether it is nomi- about language and translation that are not un- native, genitive, or dative. They will also as- derstood. It is in this area that the difference sist in handling the very difficult problems of between the two approaches shows up. The translating prepositions correctly. question of when to translate the various Ger- man, French, or Russian verb categories into 4) The selectional relations between words in the different sets of English verb categories is o pen classes, i.e., nouns, verbs, adjectives, imperfectly understood. Those who adopt the and adverbs. These relations can be utilized 95 per cent approach will seek simple partial by assigning the words to various meaning cate- solutions that are right a substantial portion of gories in such a way that when two or more of the time. They gain the opportunity of showing these words occur in certain syntactic relation- early test results on a computer. Those who ships in the text, the correct meanings can be adopt the 100 per cent approach realize that in selected. the end satisfactory mechanical translation can 5) Antecedents. The ability of the translating follow only from the systematic enlarging of the program to determine antecedents will not only area in which we have essentially perfect un- m ake possible the correct translation of pro- derstanding. nouns, but will also materially assist in the The M.I. T. group has traditionally concen- translation of nouns and other words that refer trated on moving segments of the problem out to things previously mentioned. of the area where only the 95 per cent approach is possible into the area where a 100 per cent 6) All other contextual clues, especially those approach can be used. Looking at mechanical concerned with an exact knowledge of the sub- translation in this light poses the greater intel- ject under discussion. These will undoubtedly lectual challenge, and we believe that it is here remain the last to be mechanized. that the most significant advances can be made. Finding out how to use these clues to provide c orrect and accurate translations by machine Syntactic Translation p resents perhaps the most formidable task E xamination of the six types of clues men- that language scholars have ever faced. t ioned above reveals that they are predomi- nantly concerned with the relationships of one Two Approaches word to another in patterns. The third type — A ttempts to learn how to utilize the above- the ability of the program to determine the syn- mentioned clues have followed two separate ap- tactic function of each word — is basic to the proaches. One will be called the "95 per cent others. It is basic to the first: If the machine approach" because it attempts to find a number is to determine correctly the field of discourse of relatively simple rules of thumb, each of at every point in the text, even when the field which will translate a word or class of words changes within one sentence, it must use the correctly about 95 per cent of the time, even relationship of the words in syntactic patterns though these rules are not based on a complete as the key for finding which words refer to understanding of the problem. This approach which field. It is basic to the second because is used by those who are seeking a short-cut to idioms, noun compounds, and so on, are merely useful, if not completely adequate, translations. special patterns of words that stand out from
  3. Syntactic Translation 61 nominal blocks and verbal blocks. Recently, more regular patterns. It is basic to the fourth Brandwood6 has extended and elaborated the because here we are dealing with selectional rules of Oswald and Fletcher. Reifler,7 too, relationships between words that are syntacti- has placed emphasis on form classes and the cally related. It is basic to the fifth because relationship of words one with the other. These the relationship of a word to its antecedent is last three attempts seem to come closer to the essentially a syntactic relationship. It is prob- 100 per cent way of looking at things. ably even basic to the last, the category of all B ar-Hillel, 8 a t M.I.T., introduced a 100 per other contextual clues. cent approach years ago when he attempted to Any approach to mechanical translation that adapt to mechanical translation certain ideas of a ttempts to go beyond mere word-for-word the Polish logician Ajdukiewicz. The algebraic t ranslation can with some justification be notation adopted for syntactic categories, how- called a syntactic approach. The word "syn- ever, was not elaborate enough to express the tactic" can be used, however, to cover a num- relations of natural languages. ber of different approaches. Following an early Later, the author 9, 10 proposed a syntactic suggestion by Warren Weaver,1 some of these method for solving multiple-meaning and word- t ake into consideration only the two or three order problems. This routine analyzed and immediately preceding and following words. translated the input sentences in terms of suc- Some of them, following a suggestion by Bar- H illel, 5 d o consider larger context, but by a cessively included clauses, phrases, and so forth. complicated scanning forth and back in the sen- More recently, Moloshnaya 11 has done some t ence, looking for particular words or par- excellent work on English syntax, and ticular diacritics that have been attached to Zarechnak12 and Pyne13 have been exploring words in the first dictionary look-up. To the with Russian a suggestion by Harris14 that the extent that these approaches operate without an text be broken down by transformations into accurate knowledge and use of the syntactic kernel sentences which would be separately patterns of the languages, they are following translated and then transformed back into full the 95 per cent approach. sentences. Lehmann,15 too, has recently em- Oswald and Fletcher3 saw clearly that a so- phasized that translation of the German noun lution to the word-order problems in German- phrase into English will require a full descrip- t o-English translation required the identifica- tive analysis. tion of syntactic units in the sentence, such as 5. Y. Bar-Hillel, "The Present State of Re- 11. T. N. Moloshnaya, "Certain Questions of search on Mechanical Translation, " American Syntax in Connection with Machine Translation Documentation, 2:229-237 (1951) from English to Russian," Voprosy Yazyko- znaniya. no. 4 (1957) 6. A. D. Booth, L. Brandwood, J. P. Cleave, Mechanical Resolution of Linguistic Problems, 12. M. M. Zarechnak, "Types of Russian Sen- Academic Press (New York, 1958) tences," Report of the Eighth Annual Round Table Meeting on Linguistics and Language 7. Erwin Reifler, "The Mechanical Determina- tion of Meaning, " Machine Translation of Lan- Studies, Georgetown University (1957) guages, edited by Locke and Booth (New York 13. J. A. Pyne, "Some Ideas on Inter-structural and London, 1955) Syntax," Report of the Eighth Annual Round 8. Y. Bar-Hillel, "A Quasi-Arithmetical No- Table Meeting on Linguistics and Language tation for Syntactic Description, " Language, Studies, Georgetown University (1957) vol. 29, no. 1 (1953) 14. Z . S. Harris, "Transfer Grammar," Inter- 9. V. H. Yngve, "Syntax and the Problem of national Journal of American Linguistics, vol. Multiple Meaning," Machine Translation of XX, no. 4 (Oct. 1954) Languages, edited by Locke and Booth (New York and London, 1955) 15. W. P. Lehmann, "Structure of Noun Phrases 10. V. H. Yngve, "The Technical Feasibility of in German," Report of the Eighth Annual Round Translating Languages by Machine," Electrical Table Meeting on Linguistics and Language Engineering, vol. 75, no. 11 (1956) Studies, Georgetown University (1957)
  4. 62 V. H. Yngve and that current published proposals have com- In much of the work there has been an explicit bined the first two steps into one. One might or implicit restriction to syntactic relationships add that some of the published proposals even t hat are contained entirely within a clause or try to combine all three steps into one. The sentence, although it is usually recognized that question of whether there are more than three structural features, to a significant extent, steps will be taken up later. cross sentence boundaries. In what follows, A few simple considerations will make clear we will speak of the sentence without implying w hy it is necessary to describe the structure t his restriction. of each language separately. First, consider t he regularities and irregularities of declen- T he Framework sions and conjugations. These are, of course, The framework within which we are working entirely relative to one language. is presented in schematic form in Fig. 1. This Context, too, is by nature contained entirely framework has evolved after careful considera- within the framework of one language. In con- tion of a number of factors. Foremost among s idering the translation of a certain German these is the necessity of breaking down a prob- verb form into English, it is necessary to un- lem as complex as that of mechanical transla- d erstand the German verb form as part of a tion into a number of problems each of which is c omplex of features of German structure in- small enough to be handled by one person. c luding possibly other verb forms within the clause, certain adverbs, the structure of neigh- Figure 1 represents a hypothetical transla- boring clauses, and the like. In translating into ting machine. German sentences are fed in at E nglish, the appropriate complex of features the left. The recognition routine, R.R., by relative to English structure must be provided referring to the grammar of German, G1, ana- so that each verb form is understood correctly lizes the German sentence and determines its as a part of that English complex. s tructural description or specifier, S 1 , w hich The form of an English pronoun depends on c ontains all of the information that is in the its English antecedent, while the form of a Ger- input sentence. The part of the information man pronoun depends on its German antecedent that is implicit in the sentence (tense, voice, — not always the same word because of the and so forth) is made explicit in S1. Since a multiple-meaning situation. As important as it G erman sentence and its English translation is to locate the antecedent of the input pronoun g enerally do not have identical structural de- in the input text, it is equally important to em- scriptions, we need a statement of the equiva- bed the output pronoun in a proper context in lences, E, between English and German struc- t he output language so that its antecedent is t ures, and a structure transfer routine, T.R., clear to the reader. which consults E and transfers S1 into S2, In all of these examples it is necessary to un- t he structural description, or specifier, of the derstand the complete system in order to pro- English sentence. The construction routine, g ram a machine to recognize the complex of C.R., is the routine that takes S 2 and con- f eatures and to translate as well as a human structs the appropriate English sentence in con- t ranslator. If one is not able to fathom the formity with the grammar of English, G2. c omplete system, one has to fall back on hit- This framework is similar to the one previ- or-miss alternative methods — the 95 per cent ously published16 except that now we have approach. In order to achieve the advantages added the center boxes and have a much better of full syntactic translation, we will have to do understanding of what was called the "message" much more very careful and detailed linguistic or transition language — here, the specifiers. investigation. Andreyev17 has also recently pointed out that t ranslation is essentially a three-step process Stored Knowledge The diagram (Fig. 1) makes a distinction be- tween the stored knowledge (the lower boxes) 16. V. H. Yngve, "Sentence-for-sentence Trans- and the routines (the upper boxes). This dis- lation," MT, vol. 2, no. 2 (1955) tinction represents a point of view which may 17. N. D. Andreyev, "Machine Translation and be academic: In an actual translating program t he Problem of an Intermediary Language, the routine boxes and the stored knowledge Voprosy Yazykoznaniya, no. 5 (1957) boxes might be indistinguishable. For our pur- pose, however, the lower boxes represent our
  5. S yntactic Translation 63 A Framework for Mechanical Translation Figure 1 The construction routine is a computer pro- knowledge of the language and are intended not g ram that operates as a code conversion de- to include any details of the programming or, vice, converting the code for the sentence, the more particularly, any details of how the in- s pecifier, into the English spelling of the sen- formation about the languages is used by the tence . The grammar may be looked upon in machine. In other words, these boxes repre- this light as a code book, or, more properly, sent in an abstract fashion our understanding as an algorithm for code conversion. Alter- of the structures of the languages and of the nately the construction routine can be regarded translation equivalences. In an actual translat- as a function generator. The independent vari- ing machine, the contents of these boxes will able is the specifier, and the calculated function have to be expressed in some appropriate man- i s the output sentence. Under these circum- ner, and this might very well take the form of stances, the grammar, G2, represents our a program written in a pseudo code, program- knowledge of how to calculate the function. mable on a general-purpose computer. Earlier estimates9 that the amount of storage neces- The sentence construction routine resembles s ary for syntactic information may be of the to some extent the very suggestive sentence same order of magnitude as the amount of stor- generation concept of Chomsky,18 but there is a ge required for a dictionary have not been an important difference. Where sentence gen- revised. eration is concerned with a compact represen- tation of the sentences of a language, sentence Construction construction is concerned with constructing, to order, specified sentences one at a time. This The Construction Routine, C.R. in Figure 1, difference in purpose necessitates far-reaching constructs to order an English sentence on the differences in the form of the grammars. p rescription of the specifier, S 2 . It does this by consulting its pharmacopoeia, the grammar of English, G2, which tells it how to mix the 18. Noam Chomsky, Syntactic Structures, ingredients to obtain a correct and grammatical Mouton and Co., 'S-Gravenhage (1957) English sentence, the one prescribed.
  6. 64 V. H. Yngve structurally very different. It is a form of the Specifiers v erb 'to be' followed by an adjective which For an input to the sentence construction rou- takes the infinitive with 'to.' Again the auxil- tine, we postulated an encoding of the informa- i ary 'must' has no past tense and again one tion in the form of what we called a specifier. uses a circumlocution — 'had to.' If we want T he specifier of a sentence represents that to indicate the connection in meaning (parallel- sentence as a series of choices within the lim- i ng a similarity in distribution) between 'can' i ted range of choices prescribed by the gram- and 'is able to' and between 'must' and 'has to,' mar of the language. These choices are in the we have to use coordinates that are not struc- nature of values for the natural coordinates of tural in the narrow sense. As another example, the sentence in that language. For example: there is the use of the present tense in English to specify an English sentence, one may have for past time (in narratives), for future time to specify for the finite verb 1st, 2nd, or 3rd ('He is coming soon'), and with other meanings. p erson, singular or plural, present or past, Other examples, some bordering on stylistics, whether the sentence is negative or affirmative, can also be cited to help establish the existence whether the subject is modified by a relative of at least two kinds of sentence coordinates in clause, and which one, etc. The specifier also a language, necessitating at least two types of specifies the class to which the verb belongs, s pecifiers. and ultimately, which verb of that class is to A translation routine that takes into consider- be used, and so on, through all of the details ation two types of specifiers for each language that are necessary to direct the construction would constitute a five-step translation proce- routine to construct the particular sentence dure. The incoming sentence would be ana- that satisfies the specifications laid down by lyzed in terms of a narrow structural specifier. the author of the original input sentence. This specifier would be converted into a more The natural coordinates of a language are not convenient and perhaps more meaningful broad given to us a priori, they have to be discovered specifier, which would then be converted into by linguistic research. a b road specifier in the other language, then Ambiguity within a language can be looked at would follow the steps of conversion to a nar- as unspecified coordinates. A writer generally row specifier and to an output sentence. can be as unambiguous as he pleases — or as ambiguous. He can be less ambiguous merely Recognition by expanding on his thoughts, thus specifying the values of more coordinates. But there is a One needs to know what there is to be recog- natural limit to how ambiguous he can be with- nized before one can recognize it. Many people, out circumlocutions. Ambiguity is a property including the author, have worked on recogni- o f the particular language he is using in the tion routines. Unfortunately, none of the work has been done with the necessary full and ex- sense that in each language certain types of am- plicit knowledge of the linguistic structures and b iguity are not allowed in certain situations. of the natural coordinates. In Chinese, one can be ambiguous about the tense of verbs, but in English this is not allowed: The question of how we understand a sentence one must regularly specify present or past for is a valid one for linguists, and it may have an verbs. On the other hand, one is usually am- answer different from the answer to the ques- biguous about the tense of adjectives in English, tion of how we produce a sentence. But it ap- but in Japanese this is not allowed. pears that the description of a language is more e asily couched in terms of synthesis of sen- It may be worth while to distinguish between tences than in terms of analysis of sentences. structural coordinates in the narrow sense and The reason is clear. A description in terms of s tructural coordinates in a broader, perhaps synthesis is straightforward and unambiguous. e xtra linguistic sense, that is, coordinates I t is a one-to-one mapping of specifiers into which might be called logical or meaning co- sentences. But a description in terms of anal- ordinates. As examples, one can cite certain ysis runs into all of the ambiguities of language English verb categories: In a narrow sense, the that are caused by the chance overlapping of auxiliary verb 'can' has two forms, present and different patterns: a given sentence may be past. This verb, however, cannot be made fu- understandable in terms of two or more differ- ture or perfect as most other verbs can. One ent specifiers. Descriptions in terms of analy- d oes not say 'He has can come,' but says, in- sis will probably not be available until after we stead, 'He has been able to come,' which is
  7. S yntactic Translation 65 ing of sentences in different languages is under- have the more easily obtained descriptions in taken. But the problems associated with the terms of synthesis. c enter box are not peculiar to mechanical The details of the recognition routine will translation. Human translators also face the depend on the details of the structural descrip- very same problems when they attempt to trans- tion of the input language. Once this is avail- late. The only difference is that at present the able, the recognition routine itself should be h uman translators are able to cope satisfac- quite straightforward. The method suggested earlier by the author9 required that words be torily with the problem. classified into word classes, phrases into We have presented a framework within which phrase classes, and so on, on the basis of an work can proceed that will eventually culminate adequate descriptive analysis. It operated by in mechanical routines for full syntactic trans- looking up word-class sequences, phrase-class lation. There are many aspects of the problem that are not yet understood and many details re- sequences, etc., in a dictionary of allowed main to be worked out. We need detailed in- sequences. formation concerning the natural coordinates of T ransfer of Structure the languages. In order to transfer German specifiers into English specifiers, we must Different languages have different sets of natu- know something about these specifiers. Some ral coordinates. Thus the center boxes (Fig. 1) very interesting comparative linguistic prob- are needed to convert the specifiers for the lems will undoubtedly turn up in this area. sentences of the input language into the speci- The author wishes to express his indebted- fiers for the equivalent sentences in the output ness to his colleagues G. H. Matthews, Joseph language. The real compromises in translation Applegate, and Noam Chomsky, for some of r eside in these center boxes. It is here that the ideas expressed in this paper. the difficult and perhaps often impossible match-
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2