Báo cáo khoa học: "An Electronic Computer Program for Translating Chinese into English"
lượt xem 2
download
General Considerations The procedure known as translation consists in the expression, through the medium of the target language, of that information which is conveyed by the text in the source language. We shall not consider here the conveyance of anything apart from "information" in the narrow sense.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: "An Electronic Computer Program for Translating Chinese into English"
- [Mechanical Translation, vol.3, no.1, July 1956; pp. 14-18] An Electronic Computer Program for Translating Chinese into English A. F. Parker-Rhodes General Considerations cognized", except to emphasize that from social and moral considerations the process ought ul- timately to be mechanized, and not relegated, as The procedure known as translation consists some have suggested, to a semi-skilled opera- in the expression, through the medium of the tor, which would merely replace a highly edu- target language, of that information which is con- cated translator by a less developed type of veyed by the text in the source language. We shall worker. not consider here the conveyance of anything apart The symbols in the source text, together with from "information" in the narrow sense. their ordering-relations, contain all the informa- We have further to consider that the information tion available. The semantic content of these latent in the source text may not all be relevant two kinds of item may be interchanged as between for the purposes of the exercise. Languages source and target languages. For example, we differ considerably in the kinds of information have: which they consider as "relevant." For example, in English we cannot convey any verbal concept tinglfang2tsu fang2tsu ting1 Chinese without at the same time adding information English top house top of house about when the action took place relative both to the moment of speaking and the moment of re- the relation which is expressed in the Chinese ference. In Chinese on the other hand all this text by an ordering relation, is expressed in extra information is regarded as irrelevant. English by the addition or omission of a word. Differences between relevant and irrelevant in- In the case of closely-related languages such formation are not only due to differences in lin- cases may be relatively few, but in general the guistic habit, but may be due to the common effect of this interchangeability will be to make human tendency to include irrelevant matter the distinction between "words" and "word- rather than to risk leaving out anything of im- orderings" a nuisance. One stage of our process portance. Theoretically, a "sufficient" transla- must therefore be to reduce all items of infor- tion could be defined as one which conveyed all mation, however conveyed in the source, to a the relevant and none of the irrelevant informa- common form. This stage I call "catataxy". tion. But this would be a poor aim for a com- There are two main ways of doing this. The puter program, (a) because when the same "ir- first is the "lexical", the second the "algorith- relevancies" are present in both languages, mic". Lexical methods aim to list all the re- trouble is saved by letting them pass, and (b) levant forms, be they words or word-orderings, the rigorous pruning of, for example, English and to record for each listed item an appropri- tenses, would lead to an undesirable "pidgin" ate equivalent in the target language. [An ex- effect which can in fact fairly easily be avoided. ample of the application of lexical methods to We therefore aim instead at carrying over all catataxy is described by Mr. Richens]. On the the details which do not add to the operational other hand, algorithmic methods seek to pre- labor involved, and as little as is necessary to scribe rules, analogous to the rules which we inform the target text with a minimum of ele- learn in the elementary processes of arithmetic, gance. whereby the significant word-orderings can be discovered and represented by numerical sym- Catataxis bols (like those by which we convey, in the com- puter, the "meanings" of the separate words); The required information is supplied in the and subsequently introduce further rules, to con- source text in the form of a simply-ordered se- vert these symbols into others which will indi- ries of symbols. In the case of Chinese, these cate the word order required by the target lan- symbols are "characters." I shall say nothing guage. The method of catataxis which I have here as to how these characters are to be "re- worked out is of the algorithmic type.
- Computer Program 15 Metalexis through the whole sentence twice before the full range of information is brought to bear on each Before I describe these methods in further word. detail, it is necessary to consider in some de- At the end of this process, if rightly pro- tail what form those symbols will take, by which grammed, we shall have selected a single al- the source text is represented in the machine. ternative for each word of the source text, and These symbols will be obtained as the output of this alternative will be represented by (a) a code a dictionary, whose input is provided by the signs sign, which the output dictionary will turn into a delivered to it by the reading device. Here at word of the target language, and (b) a W.C.I, once we come upon what is probably the most dif- being another code sign conveying the gramma- ficult question in machine translation. How are tical functions possible to this word in the source we to sort out, from the great variety of "mean- language in the given context. These W.C.I.'s ings" capable of being attached to a given word, will provide the raw material for catataxis. the one appropriate to the given context? The difficulty is only partly allayed by the fact that The Kind of Algorithms used in Catataxis we shall be using, in practice, restricted lan- guages. Even in the most restricted form of The program by which catataxis is carried Chinese, for example, chungl will have, among out must begin with a master-routine which will its possible meanings, "middle," "during," and identify the various W.C.I.'s, and direct the "China," while fang4 for example will require 5 computer to turn to the further algorithms ap- or 6 "basic" equivalents. propriate to each case. The identification of W.C.I.'s is done by subtraction: they are ar- Two considerations can be applied to choos- ranged in. the numerical order of their respec- ing the appropriate meaning in such cases: con- tive symbols and suitable quantities subtracted textual and grammatical. The use of contextual in turn from them; the computer will then re- criteria really amounts to further restriction of cognize each by how soon the resulting number our restricted language as we go along. It will becomes negative. The processes applied to consist in practice of arranging to store in the each word-class vary considerably. In each computer a series of indications of context, drawn case, the objective is to build up, from the ori- if possible from individual words; for example, ginal W.C.I., a symbol which indicates not only a word such as "thrilling" could be counted as the word-class of the word, according to an excluding the context "technical papers", while appropriate grammatical analysis of the lan- a word such as "influorescence" would carry guage, but also its relations, so far as they are much weight in excluding, for example, "naviga- relevant, to the other words in this particular tion". In connection with this system, each of sentence. This symbol I have called a "taxon"; the alternative meanings contained in a diction- it is worthwhile to consider in some detail what ary entry will carry a "key", arranged to "fit" form these taxa will take. (in a sense defined according to the elementary In principle, this is largely arbitrary; differ- operating of the machine) the "lock" in which the ent methods may well be found convenient for accumulated contextual information is stored. different purposes. We have heard already of As regards the grammatical criterion of choice, two possible methods of organizing sentences in each alternative might carry an indication of the mathematical terms, and the program I have kinds of other words it can be associated with. For example, chung1 after a noun preceded by proposed makes use of both "brackets" and such verbs as tsai4 or tao 4, and/or followed "lattices" (or rather, chains). The only problem, in using a procedure of this type for the con- by ti(chih), may safely be rendered by "among" struction of taxa, is to select a suitable method or (with time-words) "during". These words of representing the chosen mathematical forms can themselves be identified by special signs -- by the binary numerals which alone the com- "word-class indicators*. The procedure here, puter can handle. therefore, will involve entering at first for each The binary representation of brackets is based word a provisional word-class indicator, indi- cating the W.C.I.'s of all the alternatives not in my system on the assignation of a particu- excluded by the context criterion, and then, as lar binary place to each pair of brackets. Thus, subsequent words are read in, the provisional in the accompanying example, in the taxa A, the W.C.I.'s must be read through to see what pos- square brackets[ ] enclosing the verbal group sibilities they exclude in regard to the gramma- have in common, for all the enclosed words, the tical contexts. It may well be necessary to go digits 10 in the 1st two places. The round
- 16 A . F. Parker-Rhodes Table s howing the proposed arrangement of entries in the Input Dictionary T he linear order is that to be realized on the input-feed of the computer, and need not be re- produced on (say) dictionary cards. brackets, enclosing the "complex group" (Halli- the complex-group chain by a 1 in the first place. day) qualifying the verb tsou3, have in common The word tsou3 at which the two chains join has the additional 3 digits 001; the small brackets a 1 in both places, thus showing the structure of containing the compound hual yuan2 have a the sentence just as clearly and much more eco- further 11, which they share with their postpo- nomically than by the bracket-notation. sitive noun li3 (in practice, such a compound Having decided on the representational prin- as this would be separately entered in the dic- ciples to be used in our taxa, we have to devise tionary). In this system A (which is not the one the necessary algorithms to derive the required finally adopted) one can further perceive that binary forms from the given series of W.C.I's. the relation between verb and postverbal noun This involves, first, an appropriate method of is indicated by the change of 01 into 11 not only predetermining the W.C.I.’s, and, second, a set at the level of the main sentence (in the 1st two of routines for distinguishing the various groups binary places), but also in the subsidiary group of words which require to be recognized in the (in the 5th and 6th places). This, in practice, is taxa. It will be noticed that in our examples the a quite unnecessary refinement; it is possible W .C.I.'s themselves form generally the last to work out the structure of all sentences com- part of the finished taxon, the earlier digits being added by the algorithms. [The words yuan2 pletely without this information, and to abandon and li3 are exceptions, since their endings 1 it makes possible much shorter taxa and simp- ler programming. and 101 receive an extra 1 to show that yuan is the second element of a compound] . I therefore turned from the system exhibited in A to that of B. Here only the smaller brackets To show the sort of form our algorithms take, are retained, the larger brackets being replaced this last is an appropriate example. by a pattern of "chains". These are represented First, when we find any taxon assuming a form by prefixes, in which words belonging to one identical with its predecessor, then the required chain have a 1 in a prescribed position. In the algorithm is called in. Thus, at an appropriate example, the main-sentence chain is represent- stage, we arrange for the taxon to be subtracted ed by a 1 in the second place of the prefix, and from its predecessor; if the result is 0, the
- Computer Program 17 N.B. The points are entered for ease of reading only; in the computer each digit has its fixed place and such aids are not needed. taxon stands and is entered in the place of its they need be. Thus, it is convenient to use a W.C.I.; but if the result is 3420, we have to separate set of algorithms to alter the taxa, so arrange (i) to find the last 1 in the next taxon as to achieve the required re-ordering. (or the last 101 if the W.C.I. has this ending), This set of algorithms I call Anataxis, since (ii) to add a 1 in the next binary place. The it puts together again that which catataxis takes taxon thus amended must be substituted for its to pieces. (If the procedure is based on lexical W.C.I. In most cases, we have to add the new methods, no separate stage is required for ana- digits at the beginning, and to facilitate this the taxis). As regards programming, it is simpler digits forming the W.C.I. are placed in such a and shorter than Catataxis, and presents no position that they do not have to be shifted at all special problems, at least as between Chinese during the formation of the taxon. Often, how- and English which have rather similar word- ever, a taxon has to be altered in the light of orders; the main points are that in English the subsequent words of the sentence. qualifying phrases, of the kind which in Chinese end in ti4 or chih1, are placed after the word Anataxis qualified instead of before, and that adverbs When all the operations required in Catataxis can always (though if style is to be sought, have been completed, all the W.C.I.'s supplied should only sometimes) follow their verbs. in the original input have been replaced by taxa. In the example given above, the group in the Each taxon is thus followed, in the storage lo- outer round brackets needs to be placed at the cations of the machine, by a code sign repre- end of the sentence, and this would be achieved senting its chosen "meaning" in the target lan- in my program by (i) spotting it as a qualifying guage. Thus every significant feature of the group (by the sequence of prefixes 01,10,11,01, given sentence, whether a word or a word- separating 10,11 as the required group) and (ii) ordering, is now represented by a binary nu- altering these prefixes so as to read, in this case, meral. This series of signs has now to be so 01,11,10 (the 11 covering both the 10 and 11 of manipulated as to indicate correctly the order the original sequence). In other cases, other of words required in the target language. parts of the taxa must be altered; e.g.: It might in some cases be possible so to ar- man4 10.001 10.101 range the system of taxa so that they should . slowly give, by their own numerical order, the order man4 10.0011 1011 of words ultimately required. However, this becomes would necessitate the use of a different system tsou3 10.1 10.0 of catataxis for each target language as well as walking for each source language, and also the algo- chol 10.101 10.001 rithms required would be more complex than
- 18 A. F. Parker-Rhodes which consists of a taxon and a "meaning". The which, on arranging in numerical order, gives latter will have been modified so as to include suffi- "walking slowly". The necessary change con- cient information to determine the inflectional sists in interchanging 0 and 1 in the third place forms required, (though in a highly-inflected (of those here represented) from the left. target language the space needed for this may be too much to be accommodated in the same lo- Anaptosis cation as the main "meaning" code-sign). The taxa, however, have now served their pur- When the target language is inflected (unless pose and may be cleared or overwritten, so that the inflections have fairly exact correlates in the source language) a further stage is required their places could be occupied by the additional after Anataxis, in which the required inflections indications required, are added to the otherwise incomplete word- The last stage of the process of translation forms. With Chinese as the source language no may now begin: it consists in reading-out the assistance at all is provided in this direction, contents of the still relevant locations, in their as this language is entirely uninflected. With present order (which is that of the target lan- English as the target, the difficulty is increased guage), to a suitable output dictionary which will by the related (but logically distinct) circum- convert the coded "meanings" directly into al- stance that the required inflections mostly ex- phabetic signs capable of actuating a teleprinter press logical categories which Chinese usually which will write out the target text sentence by ignores, such as number and tense. sentence. This may be done by whatever out- In my programming essays hitherto I have put mechanism the given computer may be filled been content with rather crude solutions to the with. Perhaps punched teleprinter tape would problems of anaptosis. Thus, I have suggested be the most convenient medium. inserting "the" before all nouns where the Chi- The output dictionary need not contain any of nese gives no indication to the contrary (such the complications of that used for input. The as is afforded for example by ko4, chih1, etc.). l atter is required to carry the necessary infor- Likewise, I have expected that an appropriate mation for metalexis, and this process cannot "blanket" tense would be acceptable in most be put off, since it is (in general) necessary for "restricted" contexts; for example, in scienti- the determination of the W.C.I.'s which are them- fic papers, all facts may be put in the past selves necessary for catataxis. At the output simple, and all opinions and hypotheses in the stage, however, all that is required is to decode present. The insertion of plurals can be based the meaning, already determined by the code- on the presence of particular key words. As re- sign which the input dictionary has supplied. gards case, the only distinction which appears Therefore, the output dictionary will work on a in written English is the genitive -s, which I one-to-one basis and be correspondingly simple propose to replace everywhere by "of". in design. These elementary expedients would hardly serve for a more highly inflected target lan- One of the main difficulties in mechanical guage, and for these anaptosis would probably translation is likely to be that of checking. In have to be combined with anataxis in a single mathematical computations it is a regular and but relatively complex program. usually necessary practice to include sundry checks in the main programs. The nature of the Output translation process precludes this possibility. The best that can be done is to examine the out- What is left in the storage of the computer put to see that it is not nonsense; this is hardly when the stages of catataxy, anataxy, and anap- a sufficient check, but it is rather unlikely that tosis have been completed is a sequence of "words" an error in the computer would be such as to in the order left by the anataxis routine, each of lead to "sense" other than the correct sense.
CÓ THỂ BẠN MUỐN DOWNLOAD
-
Báo cáo đánh giá tác động môi trường - Dự án: Nhà máy sản xuất hạt nhựa 3H VINA của công ty TNHH 3H VINA tại KCN Tiên Sơn, huyện Tiên Du, tỉnh Bắc Ninh
78 p | 388 | 86
-
Báo cáo khoa học: Nghiên cứu giải pháp xây dựng bệnh án điện tử hỗ trợ chẩn đoán y khoa
21 p | 147 | 33
-
Báo cáo khoa học: A novel electron transport system for thermostable CYP175A1 from Thermus thermophilus HB27
14 p | 64 | 6
-
Báo cáo hóa học: " An FPGA-Based Electronic Cochlea"
10 p | 44 | 6
-
Báo cáo y học: "Alterations of alveolar type II cells and intraalveolar surfactant after bronchoalveolar lavage and perfluorocarbon ventilation. An electron microscopical and stereological study in the rat lung"
9 p | 58 | 5
-
Báo cáo khoa học: DNA binding and partial nucleoid localization of the chloroplast stromal enzyme ferredoxin:sulfite reductase
16 p | 44 | 4
-
Báo cáo khoa học: In vitro characterization of a plastid terminal oxidase (PTOX)
8 p | 40 | 4
-
Báo cáo y học: "The electronic version of this article is the complete one and can be found online"
6 p | 90 | 3
-
Báo cáo khoa học: Acryloyl-CoA reductase from Clostridium propionicum An enzyme complex of propionyl-CoA dehydrogenase and electron-transferring flavoprotein
9 p | 39 | 3
-
Báo cáo khoa học: "Fundamentals of Chinese Language Processing"
1 p | 56 | 3
-
Báo cáo khoa học: "Machine Methods for Proving Logical Arguments Expressed in Englis"
27 p | 43 | 3
-
Báo cáo khoa học: Molecular characterization of the membrane-bound quinol peroxidase functionally connected to the respiratory chain
14 p | 34 | 3
-
báo cáo khoa học: "Acceptance of shared decision making with reference to an electronic library of decision aids (arriba-lib) and its association to decision making in patients: an evaluation study"
9 p | 47 | 3
-
Báo cáo khoa học: Structural and mechanistic aspects of flavoproteins: electron transfer through the nitric oxide synthase flavoprotein domain
16 p | 28 | 3
-
báo cáo khoa học: "Rationale, design, and implementation protocol of an electronic health record integrated clinical prediction rule (iCPR) randomized trial in primary care"
10 p | 67 | 3
-
Báo cáo khoa học: "Mechanical Translation and the Problem of Multiple Meaning"
7 p | 53 | 2
-
Báo cáo khoa học: Analysis of the molecular dynamics of medaka nuage proteins by fluorescence correlation spectroscopy and fluorescence recovery after photobleaching
9 p | 40 | 2
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn