YOMEDIA
ADSENSE
Báo cáo khoa học: "Stochastic Methods of Mechanical Translation"
46
lượt xem 3
download
lượt xem 3
download
Download
Vui lòng tải xuống để xem tài liệu đầy đủ
IT IS WELL KNOWN that Western languages are 50% redundant. Experiment shows that if an average person guesses the successive words in a completely unknown sentence he has to be told only half of them. Experiment shows that this also applies to guessing the successive word-ideas in a foreign language.
AMBIENT/
Chủ đề:
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: "Stochastic Methods of Mechanical Translation"
- [Mechanical Translation, vol.3, no.2, November 1956; pp. 38-39] Stochastic Methods of Mechanical Translation Gilbert W. King, International Telemeter Corp., Los Angeles, California IT IS WELL KNOWN that Western languages parts of speech, and may occur with specialized are 50% redundant. Experiment shows that if meanings in different disciplines, and it is trite an average person guesses the successive words to remark that these are the factors which make in a completely unknown sentence he has to be mechanical translation hard. told only half of them. Experiment shows that Further, in each entry there is, if necessary, this also applies to guessing the successive a computing program which is to instruct the word-ideas in a foreign language. How can this data processor to carry out certain searches fact be used in machine translation? and logical operations on the sentence. It is clear that the success of the human in In operation, each sentence is considered as achieving a probability of .50 in anticipating the a semantic unit. All the words in the sentence words in a sentence is largely due to his expe- are looked up in the dictionary, and all the rience and the real meanings of the words al- material in each entry is delivered to the high ready discovered. One cannot yet profitably speed, relatively low capacity store of the data discuss a machine with these capabilities. How- processor. This information includes target ever, a machine translator has a much easier equivalent, grammar and programs. The data problem - it does not have to make a choice processor now works out the instructions from the wide field of all possible words, but is given to it by the programs, on all the other given in fact the word in the foreign language, material - equivalents, grammar and syntax and only has to select One from a few possible belonging to the sentence - all in its own tem- meanings. porary store. In machine translation the procedure has to With these facilities in mind, we may now be generalized from guessing merely the next examine some of the procedures that can be word. The machine may start anywhere in the mechanized to allow the machine to guess at a sentence and skip around looking for clues. The sequence of words which constitute its best procedure for estimating the probabilities and estimate of the meaning of the sentence in the selecting the highest may be classified into foreign language. several types, depending on the type of hardware The simplest type of problem is "the uncon- in the particular machine-translating system scious pun" which a human may face in seeing to be used. a headline in a newspaper in his own language. It is appropriate to describe briefly the system He has to scan the text to find the topic dis- currently planned and under construction. The cussed, and then go back to select the appro- central feature is a high-density store. This priate meaning. This can be mechanized by ultimately will have a capacity of one billion having the machine scan the text (in this case bits and a random access time of 20 milli- more than one sentence is involved), pick out seconds. Information from the store is de- the words with only one meaning and make a livered to a high-speed data processor. A text statistical count of the symbols indicating field reader supplies the input and a high-speed of knowledge, and thus guess at the field under printer delivers the output. The store serves discussion. (The calculations may be elaborat- as a dictionary, which is quite different from ed to weight the words belonging to more than an ordinary manual type. Basically, of course, one field.) the store contains the foreign words and their A second type of multiple-meaning problem equivalents. The capacity is so large, however, where the probability of correct selection can that all inflections (paradigmatic forms) of each be increased substantially and can also be me- stem are entered separately, with appropriate chanized is the situation where a word has equivalents. In addition, in each entry, identifi- different meanings when it is in different cation symbols are to be found, telling which grammatical forms, e.g. the two common and part of speech the word is, and in which field annoying French words: pas (adverb) "not", of knowledge it occurs. Needless to say many (noun) "step, pass, passage, way, strait, thread, words have several meanings, may be several pitch, precedence", and est (present 3rd sin-
- S tochastic Methods 39 The choice of multiple meaning of the so- gular verb) "is", (noun) "east". The probabi- called unspecified words like de (12 meanings), lity of selecting the correct meaning can be in- que (33 meanings) is much more important for creased by programming such as the following understanding a sentence. The amount of for pas: "If preceded by a verb or adverb, then cluttering of the output text by printing all the choose 'not'; if preceded by an article or adjec- multiple meanings is very great, not only be- tive, choose 'step', etc." Experiment shows cause of the large number of meanings for these t his rule (and a similar one for est) has a con- words but also because of their frequent occur- fidence coefficient of .99 of giving the correct rence. Booth and Richens proposed printing translation. only the symbol "z" to indicate an unspecified A more complicated type arises when a word word; others have proposed leaving the word h as several meanings as the same part of untranslated, and others have proposed always speech. Here we can only look forward to an giving the most common translation. These approach such as that suggested by Yngve, seriously detract from the understandability. using the syntax rather than grammar. This At the other extreme, one could give all the type, of course, has by far the largest frequen- meanings. In the case of unspecified words, the cy of occurrence. reader can rarely choose the correct one. so he The formulas above use grammar (and we is given very little additional information at hope someday syntactical context) to increase the expense of reducing the ease of reading. the probability. The human mind uses in addi- The stochastic approach of printing only the tion other types of clue. A fairly simple type, most probable permits the best effort in and hence one easily mechanized, is the asso- making sense and prints only one word, so it is ciation of groups or pairs of words (without re- easy to read. What is the probability of gard to meaning). These are the well-known successful translation? idioms and word pairs. In the system proposed Let us look at a few unspecified French the probability of correct translation of words words. Large samples of de have been ex- in an idiom is increased almost to unity by amined. In 68% of the cases "of" would be actually storing the whole idiom (in all its in- correct; in 10% of the cases "de" would have flected forms) in the store. The search logic of been part of a common idiom in the store, and the machine is peculiar in that words, or word hence correct; in 6% of the cases it would have groups, are arranged in decreasing order on been associated as "de 1'", "de la" which are each "page", so that the longest semantic units treated as common word pairs, and hence in are examined first. -Hence no time is lost in the store. In another 6% of the cases it would the search procedure. Available capacity is the h ave been correctly translated by the rule only criterion for acceptance of a word group sent to the data processor from the store: "If for entry in the dictionary. The probability followed by an infinitive verb, translate as 'to'." that certain word groups are idiomatic is so Another 2% would have been obtained by a more high that one can afford to enter them in the elaborate rule: "If followed by adverbs and a dictionary. verb, then 'to'." The single example of de le In principle, the same solution applies to word + verb probably would not have been pro- pairs. For example état has several meanings, grammed or stored. but usually état gazeux means "gaseous state". There remain then 8-10% of the cases where Can one afford to put this word pair in the dic- "in, on, from" should not be translated at all. tionary? Only experiment, with a machine, can In some of the cases "of" could have been determine the probabilities of occurrence of understandable, just as in the title of this technical word pairs. Naturally, there will be paper "Stochastic Methods of Mechanical Trans- room for some, and not for others. The excep- lation" and "Stochastic Methods in Mechanical tions lie in the same ground that we cannot ap- Translation" are equivalent. Further study, of proach with grammatical clues, but which may course, may reveal some other rules to reduce be solvable with the syntactical approach, this incorrect percentage. although at the moment the amount of informa- Not all unspecified words can be guessed tion which would have to be stored seems to be with as high a probability, but the bad cases much too large. seem more subject to programming. The choice of multiple meaning like "dream/ In summary, we believe that this type of consider" (Fr. songe) is not of first importance attack can be quite successful, but only after the ultimate reader can make his own choice a large scale study with the aid of the mechani- easily. The multiple meaning merely clutters cal translation machine itself. the output text.
ADSENSE
CÓ THỂ BẠN MUỐN DOWNLOAD
Thêm tài liệu vào bộ sưu tập có sẵn:
Báo xấu
LAVA
AANETWORK
TRỢ GIÚP
HỖ TRỢ KHÁCH HÀNG
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn