Báo cáo khoa học: " The Work on Machine Translation in the Soviet Union"
lượt xem 2
download
Problems of machine translation have been investigated in the Soviet Union since 1955.1 A number of groups are carrying out theoretical and experimental work in the area of machine translation. In the Institute of Precision Mechanics and Computer Technology of the Academy of Sciences of the USSR (ITM and VT) dictionaries and codes of rules (algorithms) have been compiled for machine translation from English, Chinese, and Japanese into Russian; and a GermanRussian algorithm is being worked out.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: " The Work on Machine Translation in the Soviet Union"
- [Mechanical Translation, vol.5, no.3, December 1958; pp. 95-100] T he Work on Machine Translation in the Soviet Union * Fourth International Congress of Slavicists Reports, Sept. 1958 V. Yu. Rozentsveig, First Moscow State Pedagogical Institute of Foreign Languages, Moscow, USSR Problems of machine translation have been An essentially different course is being fol- investigated in the Soviet Union since 1955.1 A lowed by the group working in the Steklov Mathe- number of groups are carrying out theoretical matical Institute of the Academy of Sciences and experimental work in the area of machine (MIAN). The problem of machine translation is translation. being examined here as part of the larger prob- lem of the automation of thought processes. The In the Institute of Precision Mechanics and directors of this group regard the effective prac- Computer Technology of the Academy of tical realization of machine translation only as Sciences of the USSR (ITM and VT) dictionaries the result of profound theoretical research in and codes of rules (algorithms) have been com- the area of mathematics and linguistics. piled for machine translation from English, Chi- In MIAN three algorithms have been elabo- nese, and Japanese into Russian; and a German- rated: French-Russian, English-Russian, and Russian algorithm is being worked out. Experi- Hungarian-Russian. 3 During the compilation of mental translations of individual passages have the first of these algorithms in 1955-56, the been made. 2 In the work of the ITM and VT workers in this group proceeded empirically, group there is a marked striving for the rapid i . e. they extracted the rules for the transla- a chievement of immediate, practical results. tion of each word from a comparative analysis The efforts of this group are directed not so of French texts and their Russian translations. much toward a theoretical comprehension of In the elaboration of the English-Russian algo- the general problem of machine translation as rithm, the MIAN group posed for themselves toward a careful, detailed investigation of lin- a more complex problem -- determination of guistic material, especially lexical. Diction- the correspondences between the grammatical ary routines, routines for analysis of the sen- structures of two languages. The posing of tence in the source language, and routines for such a problem was partially conditioned by the the synthesis of the sentence in the target lan- nature of the relationships of the English and guage are being compiled in the ITM and VT on Russian languages: although it was possible to the basis of traditional methods of describing a build the analysis of a sentence on a morpholo- language. gical basis in translating a French mathematical text into Russian, such a method did not seem rational to the MIAN group in the case of Eng- lish-Russian translations of similar texts. The * Translated by Lew R. Micklesen, Depart- problem was also partially conditioned by the ment of Far Eastern and Slavic Languages and theoretical goal of the director of the group. Literature, University of Washington, De- Professor A. A. Lyapunov: to work out strictly cember 1958. formal methods of describing languages in or- der to attain gradual automation of the whole process of machine translation. 1. The idea of machine translation was advanc- ed even in the 30's by the inventor-technician, P. P. Smirnov-Troyansky. 3. See O. S. Kulagina and I. A. Mel'chuk, "Ma- chine Translation from French to Russian, " Vo- 2. I. K. Bel'skaya, "Concerning Certain Gen- prosy Yazykoznaniya, 1956, No. 5; T. N. Mo- eral Problems of Machine Translation," Ab- loshnaya, "Some Problems of Syntax in Connec- stracts of the Conference on Machine Transla- tion with Machine Translation from English to tion, Moscow, 1958, pp. 10-14, (hereafter re- Russian, "Voprosy Yazykoznaniya, 1957, No. 4. ferred to as Abstracts CMT).
- 96 V. Yu. Rozentsveig Several scientific groups are now working a- The theoretical basis for the isolation of typi- long this path opened up by the efforts of the cal sentence structures was the concept of the MIAN Group. In the division of applied linguis- syntagma (according to de Saussure) or of the tics of the Institute of Linguistics of the USSR construct (according to Fortunatov). Machine directed by A. A. Reformatsky, rules for the translation, however, requires a certain modi- analysis and synthesis of a text and an abstract fication of this system. In the structural syn- system of lexical and syntactic correspondences tactic analysis proposed by the author, T. N. between various languages are being worked out Moloshnaya, of the English-Russian algorithm independent of a translation into a concrete lan- worked out at MIAN, constructs consisting not guage by I. A. Mel'chuk. All of this should al- only of two members but also of many members low us to do machine translation from several (constructions with an absolute participle, etc.) languages into several other languages (the mod- are isolated. Such elementary structures were el of such an intermediary language is being called configurations. They are composed of made on the basis of an analysis of Russian, words classified according to formal signs. English, Chinese, French, and Hungarian). The analysis consists in reducing each configu- Syntactic analysis lies at the basis of the trans- ration to its basic word, that is, shortening it. lation system being developed by I. A. Mel' In this way, syntactical links are established be- chuk — morphological data are employed only as tween the words of a sentence. Synthesis of the auxiliary data in the establishment of configura- Russian Sentence is made by means of substitut- tions, i.e. in bringing out the relationships be- ing for it a given English configuration which tween words in the source language and the ex- corresponds to the Russian configuration and pression of these relationships by means of the completing it with Russian words on the basis target language. of the data of the dictionary, more precisely, of the Russian part of the dictionary, and on the basis of the corresponding morphological rules. In this connection one should mention the re- The dictionary for machine translation, as com- search on the isolation and cataloguing of the sys- piled at MIAN during work on the French-Rus- tem of relationships in the Russian language car- sian algorithm consists of two parts: (1) the ried out in close collaboration with I. A. Mel'chuk foreign, containing the words of the given lan- in the Laboratory of Electrical Modelling of the g uage (more precisely their stems, i.e. the Ail-Union Institute of Scientific and Technical graphically invariable parts of a word) with their Information of the State Scientific-Technical corresponding tags indicating part of speech, id- Committee in the Soviet of Ministers of the USSR iomatic relationships, government by preposi- and of the Academy of Sciences of the USSR(LE). tion and grammatical characteristics and (2), In Russian mathematical texts the workers of this the Russian, containing Russian stems and the laboratory, Z.M. Volotskaya, E. V. Paducheva, corresponding information about them. The Rus- I. N. Shelimova, and A. L. Shumilina isolated and sian part of the dictionary is independent of the described about 200 syntagmas (two-membered foreign part; so it may be used in translating constructs in a subordinate relationship) which from various languages. The rules for the mor- are essential in both the analysis and the syn- phological form of a Russian word are also inde- thesis of a Russian sentence. pendent of the language from which the transla- tion is made. The significance of the MIAN English-Russian algorithm lay in the fact that in contrast to all A substantial contribution to the theory of preceding algorithms in which the analysis of translation algorithms and their programming the text under translation was realized in terms was made by O.S. Kulagina (MIAN). She de- of a translation into Russian (a category of the veloped a system of so-called elementary oper- Russian language was ascribed to a foreign ators of the simplest steps of which any trans- word), in T. N. Moloshnaya's algorithm the lation process may consist and of programs cor- structural-grammatical analysis of an English responding to these steps. As a result, signifi- sentence proceeded, in principal, independently cant generalization and standardization in the of the language into which the text was being process of making algorithms can be attained, translated. This is extremely important, for all of which allows us to pose the problem of an independent analysis opens the way for the automation of the programming of algorithms realization of machine translation not only from and then the problem of their automation and one concrete language to another, but also from construction. many languages to many others.
- MT in the Soviet Union 97 The Experimental Laboratory of Machine ing have been carried out, the elaboration of Translation of the Leningrad State University Russian-English, Russian-French, and Russian- (ELMP) under the directorship of N.D.Andreyev Spanish translation algorithms for foreign policy is also endeavoring to realize the idea of develop- texts has begun . At the Institute, the Machine ing completely independent methods of analysis Translation Society has been created at whose and synthesis and of some abstract logical sys- meetings theoretical problems are discussed tem making it possible to go from analysis to and an exchange of ideas about the practical s ynthesis, i.e. a system that will serve as an problems of the compilation of the algorithms intermediary language. In this laboratory ex- takes place. In the bulletin published by the So- tensive material from various linguistic sys- ciety are published both theoretical and experi- tems is being investigated; Indonesian-Russian, mental work connected with the problem of ma- A rabic-Russian, Japanese-Russian, Burmese- chine translation. In May, 1958, the Society Russian, Norwegian-Russian, English-Russian, convened the First All-Union Conference on Ma- Spanish-Russian and Turkish-Russian algorithms chine Translation. Seventy-nine institutions are being developed. The intermediary lan- were represented at the conference, including guage which N. D. Andreyev is attempting to twenty-one institutes of the Academy of Sciences create is an artificial language constructed by of the USSR and eight institutes of the Academies averaging the phenomena of various languages. of Science of the Union Republics, eleven univer- It is regarded as a material language with its sities, and nineteen other institutions of higher lexicon, its morphology, and its syntax, but learning in the country. Linguists, mathemati- with the one peculiarity that it consists of sym- cians, and technicians took part in the work of bols *. In the selection of the categories at the the conference. At the plenary and sectional basis of his symbolization, N. D. Andreyev con- meetings of the conference there were discus- siders the most frequent phenomena and also sions of more than seventy reports and communi- the international prestige of each language.4 cations devoted to general linguistic problems The system of signs developed in ELMP for arising in connection with the use of language in the recording of the intermediary language can present-day automatic devices as well as to spe- be used also for the recording of information in cial problems of construction of algorithms for machine translation. 5 information machines. Along with work on the algorithms of machine The central problem now confronting linguists translation from foreign languages into Russian working in the field of machine translation is and from Russian into foreign languages being that of the methods of formal description of lin- conducted in the Gorki State University, the fol- guistic structures. Structural methods, parti- lowing algorithms are being elaborated: Arme- cularly the methods elaborated by descriptive nian-Russian and Russian-Armenian (in the Com- linguistics, offer much of value for the formal putation Center of the Academy of Sciences of description of language — it was not by accident the Armenian SSR), Georgian-Russian and Rus- that the work of Fries in the structure of the sian-Georgian (in the Institute of Automation English language proved useful in working out and Telemechanics of the Academy of Sciences English configurations. It has become clear, of the Georgian SSR). however, that these methods are inadequate for In the First Moscow State Institute of Foreign the formal description of language to the ex- Languages (I MGPIIYa) where under the direc- tent that this is demanded in automatic transla- torship of I.I.Revzin theoretical investigations tion. In connection with this a search for means of the problems of machine translation and of of applying mathematical methods to the analys- related problems of linguistic theory of trans- is of language was begun. With this in mind the lation and methodology of foreign language teach- Department of Philology of the Moscow State Uni- versity initiated a seminar on mathematical lin- guistics in 1956, joining mathematicians and lin- guists under the direction of P.S. Kuznetsov, V. V. * Translator's note: The author obviously Ivanov, and V. A.Uspensky. Here, as well as at means symbols different from the conventional the meetings of the Machine Translation Society symbols of language. the idea, suggested by Academicians A. N. Kolmogorov and A. A. Lyapunov, of applying the methods of mathematical logic and of set 4. N. D. Andreyev, "Machine Translation and the Problem of an Intermediary Language," Vo- 5. See Abstracts CMT, M., 1958 prosy Yazykoznaniya, 1957, No. 5.
- 98 V. Yu. Rozentsveig theory to the study of language was discussed. be marked. The isolation of the configurations allows one to determine the syntactic structure Thus, for example, A. N. Kolmogorov's idea of the sentence. about the possibility of a strict formal definition of the category of case (the work of V.A. Uspens- The set-theory concept of language is strictly k y and, in part, also of R. L. Dobrushin) was deductive and formal. This is just what deter- expounded and developed. It is interesting to mines its importance both for general linguis- note that eight cases can be counted in the de- tics and for machine translation. Naturally the clensional system of the Russian substantive formal description of language is possible only according to this definition. to a limited extent. Thus, the concept of the A method for defining grammatical categories, marked quality of sentences, without which it worked out by a student of Professor Lyapunov, is impossible to determine the equivalence of O. S. Kulagina (MIAN), was discussed at the se- elements and configurations of a language, will minar. This method of definition allows one to have little effect if it is extended to all function- obtain, independently of the concrete features al areas of language. But in a limited sphere of the language, a classification of words and a of language — and machine translation at the determination of their syntactic relationships. present time is being considered only within the Language in this conception is regarded as a limits of scientific and technical prose — this set of elements — words, or more exactly — concept is sufficiently exact and effective. Thus, word forms. A finite number of words arrang- all sentences in a given language which are met ed in a definite order is called a sentence. Cer- in a given field of scientific literature can be tain sentences are assumed to be marked — considered marked. these are sentences constructed according to the norms of the given language — others are The set-theory conception of language is im- unmarked. According to the criteria of mutual portant in yet another respect. Since it allows substitutability of words in the marked sentences us to construct and investigate a grammatical the entire set of words is broken down into groups m odel, i.e. a simplified analog of actual lin- of mutually equivalent words. guistic relationships, this theory opens one of the possible ways for logico-semantic investiga- In terms of this system a series of definitions tions of language. In this connection we should corresponding, in general, to certain tradition- point to the ideas of V. V. Ivanov about the pos- al morphological categories, for example, parts sibility of applying mathematical methods to the of speech, was successfully obtained. The ad- definition of the lexical meaning of words. I vantage of this classification lies, however, in note that, contrary to wide-spread opinion, the the fact that it has been deduced on the basis theory of machine translation is not limited to of an exact and strictly formal system of defi- the investigation of language in its formal as- nitions. It is particularly effective for languages pect alone. The search for methods of objective, with a rather symmetrical system of word forms precise description of the system of meanings (for example, French). In languages like Rus- in language has begun. sian that do not possess this symmetry, the method of defining a grammatical category pro- If it is true that complete formal description posed by R. L. Dobrushin can be utilized. of an actual language is hardly accessible, that By making use of the criterion of equivalency, it is necessary to attain only formal approxima- the relationships between the classes of words tions to actual language, then a statistical eval- isolated are also determined. Moreover, the uation of the probability of this approximation concept of configuration, mentioned earlier, acquires special importance 6. On the other gets a more exact definition: a configuration is hand, certain phenomena of language do not defined by O. S. Kulagina as that combination yield, for the time being, to structural descrip- of not less than two words belonging to various tion and can be formally described only statisti- non-intersecting subsets, which can be reduced cally. to one element without any marked sentence con- taining this configuration losing its marked quali- ty. Thus the combination of the words "thick book" in the sentence "the thick book lies on 6. See V. A. Uspensky, "Conference on the the table" can be reduced to the element "book" Statistics of Speech," Voprosy Yazykoznaniya, or can be replaced by the element "thing" or 1958, No. 1, p. 173. the element "it" without the sentence ceasing to
- MT in the Soviet Union 99 cal) code. Statistical investigations have shown The quantitative aspect of linguistic phenomena, that in the case of such coding 4, 000 common both lexical and grammatical, has been consider- words would be sufficient in order to insure the ed, as a rule, in all the algorithms formulated. transmission of 97.5 percent of a general-lan- One should point particularly to the statistical guage text. investigations carried out on Russian language The problem examined here is connected, for material in the Laboratory of Electrical Model- the most part, with an analysis of the text under ing. I have already mentioned the cataloguing translation. For the Soviet specialists the ela- of Russian syntagmas. This work was accom- boration of effective methods for analysis pre- panied by a statistical investigation of the lan- sented special difficulties: they dealt primari- guage of Russian mathematical texts. The re- ly with morphologically poor languages. It sults of this work conducted by I. A. Mel'chuk, would be erroneous, however, to assume that T. N. Moloshnaya, A. L. Shumilina, Z. M. the synthesis of the Russian sentence did not Volotskaya, and I. I. Shelimova, were, along present any serious difficulties to them. By with other works, announced at the conference way of illustration we may cite the difficulties on the statistics of speech convoked in October arising in the synthesis of Russian aspectual 1957 by the Section of Speech of the Commission forms, inasmuch as the category of aspect per- on Acoustics of the Academy of Sciences of the meates the entire Russian verbal system. USSR and by Leningrad University. This work Here two problems of principle arise. In the is of interest not only in a practical respect. first place, it is necessary to find a principle Its value consists in a true solution to the prob- of classification of Russian verbs which will al- lem of combining statistical and structural me- low us to obtain for each verb in an absolutely thods: a count of linguistic elements was car- regular way (by adding or taking away the same ried out by the authors on the basis of a clear- letters) all forms of the perfective as well as of cut definition of such concepts as "syntagma", the imperfective aspect. Such work was done by "type of syntagma", etc. As I. I. Revzin show- Z. M. Volotskaya (LE), who obtained three break- ed in his report presented at the conference downs of the whole Russian verbal complex ac- mentioned, the correlation of structural and cording to method of formation: a) of present statistical methods has a two-sided nature: sta- tense forms; b) of past tense forms; and c) of tistics aids in specifying the structure of lan- the perfective stem from the imperfect stem. 8 guage and an exact structural definition of units, In the second place — and this task is much the number of which are counted, insures the more difficult — it is necessary to work out the proper conduct of the statistical investigation. rules for the choice of one or the other aspectu- A frequency count of dictionary units is im- al form. Inasmuch as the tendency towards car- portant not only in connection with machine rying out the operations of synthesis independent- translation. No longer speaking about statisti- ly from those of analysis has already been noted, cal investigations of problems of general and particular linguistics 7, which have already be- these rules must be constructed on the basis of contextual data, considering, for example, the come traditional, we shall point to recent works presence in the sentence of adverbs, the charac- connected with the use of language in various ter of the combination, etc. In a series of cases devices for the storage, processing, and trans- one must limit oneself only to a probable solu- mission of information. In reference to the Rus- t ion, based on statistics. sian material we can call attention to the use of The problem of machine translation from Rus- methods of machine translation for the coding sian, of course, occupies Soviet investigators of telegraphic and telephonic messages. less than the problem of translation into Russian. It has been established (V. I. Grigor'ev and But investigative work connected with the analy- G. G. Belonogov) that the size of a telegraph sis of the Russian sentence has already begun message in Russian can be diminished by 3-4 (chiefly in the Laboratory of Electrical Modeling, times if the telegraphic communication is trans- the Division of Applied Linguistics of the Insti- lated from a letter code into a dictionary (lexi- tute of Linguistics of the Academy of Science of the USSR and in ITM and VT). From the point of view of general linguistics the work reveal- 7. In this connection one should recall the works in the statistical investigation of Russian literary works, carried out in the 20's and 30's 8. See Abstracts CMT, p. 87 b y A. I. Peshkovsky, M. Peterson, et al.
- 100 V. Yu. Rozentsveig ing the redundancy of certain categories of the tactic links for formulas in Russian mathemati- Russian language is most interesting. Thus, cal texts (M. M. Langleben) — by formulas the author means all elements not found in the ma- for example, the category of gender in the Rus- chine dictionary during the processing of the text sian verb, expressed only in the forms in -1 of (mathematical formulas, foreign-language cita- the singular of the past tense and of the condi- tions, surnames, etc.) tional mood, is redundant, unnecessary from For the analysis of a Russian sentence it is the standpoint of analysis. It is clear (V. N. necessary to characterize the marks of punctu- Vinogradova, the Institute of Linguistics of the ation. Only in such a way can one find the lim- Academy of Science of the USSR) that in scienti- its of a simple clause within a sentence, isolate fic texts the number of verbs with the expressed i ts similar members, aid the further clarifica- form of gender comprises from four to thirty per- tion of the co-relationships of the individual cent and that in the majority of sentences the parts of a sentence with complex punctuation, de- verb can be related only to the subject — the termine a group of similar members. T. N. only substantive in the nominative case. Nor is Nikolayeva (ITM and VT) conducted an analysis i t necessary, in most cases, to consider the in- of polysemantic marks of punctuation (comma, flection of the Russian adjective and determine dash, colon) in Russian9. the relationships of the adjective to the substan- Thus the realization of machine translation tive with which it agrees on the basis of the po- presupposes serious theoretical investigations, sition of the adjective in the sentence. (N. N. which, in turn enrich the problems of general Leont'eva and G. H. Vavilova, the Institute of and applied linguistics. Linguistics). Interesting also is the work on the determina- tion of syntactic links for the preposition-case groups of the Russian language (I. N. Shelimova) 9. See Abstracts CMT, pp. 104-107 and also the work on the elaboration of the syn-
CÓ THỂ BẠN MUỐN DOWNLOAD
-
Báo cáo khoa học: Nghiên cứu sản xuất giá đậu nành
8 p | 258 | 35
-
Báo cáo khoa học: Vị thế của tiếng Anh trên thế giới và ở Việt Nam
8 p | 164 | 12
-
Báo cáo khoa học:Bắt đầu và thể khởi phát tiếng Việt
17 p | 101 | 6
-
báo cáo khoa học: " Part I, Patient perspective: activating patients to engage their providers in the use of evidencebased medicine: a qualitative evaluation of the VA Project to Implement Diuretics (VAPID)"
11 p | 122 | 5
-
Báo cáo khoa học: "The complete genome of klassevirus – a novel picornavirus in pediatric stool"
9 p | 91 | 4
-
Báo cáo khoa học: Các thế hệ máy gia tốc xạ trị và kỹ thuật ứng dụng trong lâm sàng
22 p | 7 | 4
-
báo cáo khoa học: " Looking inside the black box: a theory-based process evaluation alongside a randomised controlled trial of printed educational materials (the Ontario printed educational message, OPEM) to improve referral and prescribing practices in primary care in Ontario, Canada"
8 p | 128 | 4
-
báo cáo khoa học: " Overview of the VA Quality Enhancement Research Initiative (QUERI) and QUERI theme articles: QUERI Series"
9 p | 66 | 3
-
báo cáo khoa học: " Taking stock of current societal, political and academic stakeholders in the Canadian healthcare knowledge translation agenda"
6 p | 80 | 3
-
báo cáo khoa học: " Testing a TheoRY-inspired MEssage ('TRY-ME'): a sub-trial within the Ontario Printed Educational Message (OPEM) trial"
8 p | 72 | 3
-
báo cáo khoa học: " An observational study of the effectiveness of practice guideline implementation strategies examined according to physicians' cognitive styles"
9 p | 118 | 3
-
Báo cáo khoa học: " Expression of Ebolavirus glycoprotein on the target cells enhances viral entry"
15 p | 107 | 3
-
Báo cáo khoa học: "Effective suppression of Dengue fever virus in mosquito cell cultures using retroviral transduction of hammerhead ribozymes targeting the viral genome"
17 p | 75 | 3
-
Báo cáo khoa học: " Development of TaqMan® MGB fluorescent real-time PCR assay for the detection of anatid herpesvirus 1"
8 p | 87 | 3
-
báo cáo khoa học: " Implementing evidence-based interventions in health care: application of the replicating effective programs framework"
10 p | 75 | 3
-
Báo cáo khoa học: " The directionality of the nuclear transport of the influenza A genome is driven by selective exposure of nuclear localization sequences on nucleoprotein"
12 p | 64 | 3
-
Báo cáo khoa học: "Evolution of the M gene of the influenza A virus in different host species: large-scale sequence analysis"
13 p | 66 | 3
-
Báo cáo khoa học: "Protein intrinsic disorder and influenza virulence: the 1918 H1N1 and H5N1 viruses"
12 p | 60 | 3
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn