intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Báo cáo khoa học: " COMPUTING MACHINES FOR LANGUAGE TRANSLATION "

Chia sẻ: Nghetay_1 Nghetay_1 | Ngày: | Loại File: PDF | Số trang:6

51
lượt xem
3
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

RESEARCH on the problems of machine translation has been going on for several years in this country and abroad. 1 To date it has been concerned primarily with the complicated linguistic problems involved in mechanical translation, since the engineers can probably build the necessary equipment.

Chủ đề:
Lưu

Nội dung Text: Báo cáo khoa học: " COMPUTING MACHINES FOR LANGUAGE TRANSLATION "

  1. [ Mechanical Translation, vol.1, no.3, December 1954; pp. 41-46] COMPUTING MACHINES FOR LANGUAGE TRANSLATION T. M. Stout* Schlumberger Instrument Company Old Quarry Road, Ridgefield, Conn. RESEARCH on the problems of machine trans- fining the results to fit the requirements of the lation has been going on for several years in output language; and (5) recording of the results this country and abroad. 1 To date it has been i n written or other form for future use. The concerned primarily with the complicated lin- general procedure may be illustrated by an ex- guistic problems involved in mechanical trans- ample. lation, since the engineers can probably build Suppose the translator is faced with the Ger- the necessary equipment. This article is in- man sentence: tended to suggest some of the linguistic pro- Er fand die Aufgabe zu schwer, blems to the engineer and to explain some of t he engineering ideas for the amateur or pro- which may be translated, "He found the task too fessional linguist. The reader is cautioned that difficult." A German-English dictionary gives the procedures and equipment described are not the following meanings for the individual words: n ecessarily the best or most recent, and that Er - he considerable development must be done before fand (from finden) - found;thought,considered an actual mechanical translator is built and put die - the (article); that, this, he, she, it (dem. into operation. pronoun); who, which, that (rel. pronoun) Aufgabe - t ask, duty; lesson, exercise; asking General Approach: The Language Problem (of riddle); posting (of letter); registration (of P resent proposals for a mechanical transla- luggage); giving up, shutting down (of tor involve, in rough terms, constructing a ma- business) chine which carries out automatically the pro- zu - to, at, in, on (preposition); too (adverb) cess that the human translator is imagined to schwer - heavy; oppressive; clumsy; difficult; use in converting a sentence from one language grave (illness); indigestible (food); strong (the input language) into a new language (the (cigar) output language). This process is assumed to c onsist of (1) transferring the material from Er can be translated only by "he." Although the printed page to the brain (reading); (2) finden generally means "to find" in the sense of searching a dictionary to establish the mean- "to discover," it also has the figurative mean- ing or meanings of each word in the original ing, "to think" or "to consider." English "find" text; (3) selecting the correct meaning from the also shares these meanings and no great harm p ossible alternatives; (4) rearranging and re- will be done if finden is always translated as "find." The presence of a noun following die, * This work was done at the Department of i ndicated by the capital letter or by a diction- E lectrical Engineering of the University of ary entry opposite Aufgabe, makes its transla- Washington in Seattle, Washington, and was tion "the." The translation of Aufgabe may be originally published in THE TREND in En- t aken as "task" in all cases, since this mean- gineering at the University of Washington, ing is general enough to include all of the other, Vol. 6, No. 3, p. 11 ff, July 1954. The author's specialized meanings; the nature of the task i nterest in mechanical translation and many of should be clear from the context. Zu is trans- t he ideas contained in this article are the lated as "too" because of the following adjec- result of conversations with Dr. Erwin Reifler tive, which presents the toughest problem in o f the Far Eastern Department of the Univer-- the sentence. The choice in this case evidently sity of Washington. depends on the feeling that a task can be diffi- cult, but not heavy, clumsy, grave, indigestible, 1 MECHANICAL TRANSLATION, Vol. I .March or strong. 1954, published at the Massachusetts Institute of As this meaning suggests, a word which has Technology. An extensive bibliography of pub- only one meaning (or can arbitrarily be assigned lications in this field. only one meaning) will present no problems. Any 41
  2. 42 T. M. STOUT TABLE I ALPHABET CODING USED IN IBM PUNCHED word with several meanings, however, will cause CARD EQUIPMENT considerable trouble. The selection of a parti- cular meaning is sometimes based on gramma- A BCDEFGHI JKLMNOPQRSTUVWXYZ tical considerations, sometimes on the presence of other words or types of words, and some- ll x x x x x x x x x times on the nature of the subject matter. In 12 xxxxxxxxx addition to the ability to read and write and 0 xxxxxxxx s earch a dictionary, the machine - like the hu- man translator - must be able to discern gra- 1X X matical distinctions and the occurrence of words 2x x x which determine the meanings of associated 3 x x x words. 4 x x x 5 x x x 6 x x x 7 x x x Coding 8 x x x 9 x x x At the present stage of development, it is assumed that the translating machine will work only with printed material. In addition to some TABLE II obvious engineering advantages, this approach STANDARD BAUDOT TELETYPE CODE has the linguistic advantage that the written lan- guage is more distinctive than the spoken langu- LETTER PULSE LETTER PULSE age. In English, for instance, the homonyms, 12345 12345 n ot-knot, pair-pear-pare, and numerous other groups of words are easily distinguished by their A XX N XX spelling. The number of words with the same B X XX O XX spelling and different pronunciations,such as C XXX P XX X lead-lead and bow-bow, is much smaller. D X X Q XX X X Since most computers are designed to work E X R X X with numbers, the incoming text must be con F X XX S X X verted from the written alphabet into a numeri- G XXX T X cal form acceptable to the machine. Several H X X U XX X different coding schemes are available for this I XX V X X XX purpose. One obvious procedure is simply to J XX X W XX X number the letters, using either two-digit deci K XXXX X X XX X mal numbers or five-digit binary numbers. Coded in this manner, A-B-C-D. . .would be L X X Y X X X come 01-02-03-04..., or 00001-00010-00011- M XX X Z X X 00100… Other codes are commonly used in standard devices, such as a switch or relay whose con- equipment which might be incorporated in a tacts are either closed or not closed, a vacuum translating machine. Machines available from tube which does or does not carry current, a IBM use the code given in Table I, in which each magnetic core which is magnetized or not, and letter is represented by two holes punched in a so forth. Since it is easy to determine which column of a standard punched card; the upper state exists, reliable operation is obtained with- hole is called a zone punch and the other is a out any accurate measurements or precision digit punch. Standard teletypewriters use the components. Baudot code given in Table II, which employs five pulse positions in a manner similar to the Input and Output Devices binary code (plus a sixth pulse for timing). A number of standard devices are available Binary or teletype coding requires more di- for coding the incoming text for insertion into gits for each letter than the decimal or IBM the machine and, after the translation process coding and might appear to require considerably is completed, for decoding and printing the more space. On the other hand, these codes em- translation in the output language. Teletype- ploy only two symbols (0 and 1, pulse and no writers, operated by typists with no knowledge pulse) for each digit. The physical elements in of either language, could be used to supply the computer can therefore be simple two-state
  3. COMPUTING MACHINES FOR LANGUAGE TRANSLATION 43 e lectrical signals directly to the translating Individual words are stored along the length machine or to prepare punched paper tape for of the drum (each letter being represented by later use. Similar machines can be used to a group of five magnetized or unmagnetized type the final output of the translator. spots) and pass the reading heads once in each Input devices now available are relatively revolution of the drum. Words in the input slow, so that faster means of supplying ma- language are stored at one end of the drum, and terial to the translating machine would be es- their equivalents in the output language at the sential. An electronic reading device, capable other end. If the drum is rotated at 2,400 rpm, o f working directly from the original printed or 40 rps, each word is available in not more text, has recently been announced.2 Faster out- than 25 milliseconds. Following standard prac- put devices will also be required to maintain tice, 80 spots per inch can be placed around the over-all balance. circumference of the drum and 8 tracks per inch along the length of the drum. Allowing 10 Storage letters or 50 tracks per word in both halves of The dictionary needed in a mechanical trans- the dictionary, a drum 12.5 inches long and 12 lating machine might be stored on a magnetic inches in diameter would hold approximately drum such as the one shown in Fig. 1. This 3,000 words and their translations. type of storage, in which information is stored In order to reduce the average time spent in by magnetizing small areas on the surface of a searching the dictionary, certain common words revolving cylinder, is widely used in arithmetic might be stored several times on the same computers and has a number of desirable pro- drum. The 850-word vocabulary of Basic Eng- perties: a large ratio of information to volume, lish could be stored three times on a single lower access time, permanence, and simplicity. d rum, so that any particular word is available L1 L2 ETC L1 L2 ETC INPUT LANGUAGE OUTPUT LANGUAGE FIG. 1. MAGNETIC DRUM FOR DICTIONARY STORAGE Words (W1, W2, etc.) are stored along the length of the drum,and each l etter (L 1 , L 2 , etc.) requires five tracks around the drum. 2 Shepard, D. H., "The Analyzing Reader." A paper presented at the IRE convention in San Francisco, Aug. 19, 1953.
  4. T. M. STOUT 44 in a third of a revolution or less (not over 8 elimination of impossible translations, dis- milliseconds). cussed hereafter. To provide an adequate vocabulary for satis- The techniques used in the dissection of com- factory translation, several such drums would pounds will be valuable in still another way. If be required. By searching all drums simul- a w ord has more letters than are permitted by taneously, as explained below, any word in the the physical size of the dictionary (ten letters dictionary could be found in the time required in the example above), it can be split into two for one drum revolution. At approximately one parts which separately signify nothing. Berat- cubic foot per drum, exclusive of the associated schlagen. for example, might be split into circuits, the space required for a vocabulary of Beratsc and hlagen, with parts of the translation 100,000 words or so becomes rather large. A stored opposite each half. Dictionary space is n umber of tricks are available, however, for used more efficiently in this manner, but the reducing the size of the mechanical dictionary. processing time may be increased excessively. If we are concerned with translation into Eng- Splitting words in order to determine parts of lish, as seems probable, many words in the in- a compound, or stems and endings, is fraught put language text will not require translation. with difficulties which must be explored by English has borrowed extensively from other linguists. The engineering techniques for carry- languages and many foreign words are imme- ing out these operations have been devised, but diately recognizable by the English reader. A are too involved to discuss here. glance at a German dictionary, for example, reveals such words as Deck, Despot, Diplomat, and Dock which are identical with the English forms; we also find Demagog, Demokrat, direkt, Distanz. and Doktor which differ slightly in spelling but would present no real difficulties t o the reader. The translation process can be by-passed for such words, and the original in- put word printed directly in the output. This approach must be used with caution, since the two languages may not share all the meanings and connotations of a given word, but it does offer hope for tremendously reducing the size of the mechanical dictionary. Compound words are rather common in Ger- man and can, in fact, be invented at will by writers and speakers. If the meaning of a com- pound is clear from the meanings of its consti- tuents (as is likely for all except old well-esta- blished compounds, which will be entered as distinct words), the dictionary can be searched Dictionary Search for each constituent separately, and the respec- tive translations compounded on the output side. In making a mechanical translation, the first Endings, used extensively in other languages step is a comparison of each word of the incom- to convey grammatical information such as ing text with the entire dictionary. If any word t ense and number, can be treated in similar is not found in the dictionary in its original fashion to effect a further reduction in the size form, the dissection scheme for endings and of the dictionary. Each word might be regarded compounds can be tried; if this fails, the word as a compound built from a stem, common to all can be printed through without alteration. forms of the particular word, and an ending, Several methods are available for making this which may be shared with other words. The comparison; an impractical but easily under- dictionary may then be split into a large stem stood system is shown in Fig. 2. This system section and a small ending section. A useful requires two single-pole double-throw relays b y-product of this procedure is the gramma- for each pulse position: one relay operated by tical information made available by the identi- the incoming text and the other relay operated fication of an ending; this may be used in the by pulses from the reading heads on the magne-
  5. COMPUTING MACHINES FOR LANGUAGE TRANSLATION 45 tic drum. The path between points "a" and "b" The incoming text is supplied to all drums at is closed only when both relays are either ener- the same time. Correspondence between the gized (pulses present in both incoming word and incoming word and a dictionary entry is noted dictionary) or not energized (spaces present in on only one drum, from which the translation is both places). The occurrence of a closed path, obtained. Parallel operation of this type would t herefore, indicates that the particular pulse permit a dictionary of any desired size with the position is identical in both the incoming word access time of a single drum, but at a consid- and the dictionary. erable price in additional checking circuits. Entire letters, coded as a group of five pulses In a practical comparison system crystal di- or spaces, can be checked by a series combina- odes, transistors, or vacuum tubes would be tion of five such relay circuits, as shown in Fig. used instead of relays. These elements have no 3. In corresponding fashion, words of ten let- moving parts to limit the speed of operation and ters could be checked by a series combination require much less signal power. of fifty such relay circuits. A closed path through a long string of such circuits indicates Multiple Meaning that the incoming word has been found in the Having obtained the possible translations for dictionary, and this event can be made to initiate each word in a sentence, the machine is faced printing of the translation stored at the other with the problem of selecting the correct mean- end of the drum. ing from several alternatives. This problem An input-language word with several mean- can be attacked in a number of ways. ings can be entered in the dictionary several In technical writing many words have special- times, each time with a suitable translation. ized meanings which are used in all texts in a The searching procedure outlined above would given area of science. For example, Flügel in uncover each of the possible translations and a paper on aeronautical engineering is much would make them all available for further con- more likely to mean "wing" than "grand piano," sideration. To assist in the subsequent selec- both of which are given in a general dictionary. tion of one of these meanings, each translation The machine could be instructed to select the might have a "tag" stored with it, which would specialized meaning when the text is known to supply grammatical or other necessary infor- be in a specialized area (by means of appropri- mation needed by the machine. ate tags) or special dictionaries could be used. With a multiplicity of such circuits, a number A number of distinct problems can be recog- of dictionary drums could be searched simul- nized in the case of general language. As indi- taneously, as suggested schematically in Fig. 4. cated by the examples, the translation of a word
  6. T. M. STOUT 46 is sometimes based on grammatical considera- ranged to count the number of times each rule tions, sometimes on the co-occurrence of ano- is used and the number of successes scored, so ther word or type of word in the same sentence that the effective rules can be applied first and or clause, and sometimes on the larger context. ineffective rules discarded. In all cases, the choice is determined by exa- The linguistic rules will necessarily be coded mining the surrounding words and, according to and could, in fact, be expressed in algebraic fashion by the techniques of symbolic logic.3 rules furnished by the linguists, either selec- ting or eliminating certain alternatives. The resulting algebraic expressions can be sim- The general procedure employed by the ma- plified by formal procedures and can be con- chine in selecting the proper meaning can be verted directly into devices which carry out the indicated by an example. For the German sen- selection process. The so-called logic circuits tence given above, a superficial study suggests needed in a mechanical translator are employed the following rule for the translation of zu: if in conventional arithmetic computers and their zu is followed by an adjective or adverb, its design should pose no special problems. m eaning is "to," but otherwise it is a preposi- Conclusion tion, and its meaning must be determined by Experiments with word-by-word translation additional analysis. The translating machine by mechanical means have already been con- can be instructed to examine the tag on the word ducted with surprisingly good results, even following zu and, if the code designation for an where no attempt has been made to deal with the a djective or adverb appears, to select "too" as problem of multiple meanings. With even a rud- the meaning. imentary set of rules for selecting or elimina- Not all words present difficulties with multi- ting some of the possible meanings, still better ple meanings, and the mechanical translator can results should be obtained. If the linguists can e asily locate the trouble-makers in any sen- d iscover the rules, the engineers are ready to tence by counting the alternatives encountered in build the equipment, given the necessary sup- the dictionary search. Having found a word port. Practical mechanical language transla- with several possible meanings, the machine can tion is a definite possibility for the near future. refer to a list of rules appropriate to this word or its general class of words. This list should be flexible, so that rules can be added or dis- 3 Langer, S. K., AN INTRODUCTION TO carded without disrupting the operation of the SYMBOLIC LOGIC: New York, Dover Publi- other rules. The machine can probably be ar- cations, 1953.
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2