intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Báo cáo khoa học: "Preprogramming for Mechanical Translation"

Chia sẻ: Nghetay_1 Nghetay_1 | Ngày: | Loại File: PDF | Số trang:6

41
lượt xem
2
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

TRANSLATION is a species of communication in which the set of symbols adopted by the communicator is changed into another set of symbols before reception. It is possible to argue that all communication involves such a substitution of symbols and that communication within a single language is merely a limiting case of translation.

Chủ đề:
Lưu

Nội dung Text: Báo cáo khoa học: "Preprogramming for Mechanical Translation"

  1. [Mechanical Translation, vol.3, no.1, July 1956; pp. 20-25] Preprogramming for Mechanical Translation R. H. Richens of preprogram. Machine translation depends TRANSLATION is a species of communication on collaboration between linguists, engineers in which the set of symbols adopted by the com- and an obscure set of people interested in the municator is changed into another set of sym- bridge territory between the two, where pro- bols before reception. It is possible to argue blems of logic and semantics arise. It is not to that all communication involves such a substitu- be expected that a person whose primary in- tion of symbols and that communication within terests are linguistic will appreciate the nicer a single language is merely a limiting case of details of electronic circuitry. It is therefore translation. For present purposes, however, important to develop procedures that are com- we shall confine the scope of discussion to trans- prehensible to linguists and engineers alike and lation between different spoken or written lan- can be used as the basis for developing detailed guages. programs for any particular machine. Such We have next to inquire as to what remains in- general procedures are referred to here as pre variant in translation. If we try to convey the programs. Till now, the devices principally maximum significance of the symbols of the used for experiments in machine translation base language, it is clear that a great deal is in- have been punched-card machines and electro- volved: gross meaning, the subtler overtones, nic computers. It is possible that the best ma- deliberately concealed meanings, manifestations chine for machine translation as regards both of the subconscious mind, the sound of the base efficiency and expense has not yet been devised. w ords or their appearance in script, metrical It is important therefore to develop procedures characteristics, etymology, the associations en- that are not tied down to any particular machine gendered by the communication, the statistical but which can easily be applied to a particular characteristics of the communication as a sample machine when required. of the output of a particular author or period, and the pleasure or otherwise engendered by com- A question that is of considerable interest is munication in an informed or cultivated reci- the optimum combination of man and machine. pient. It is obvious that a mere fraction of all It has come to be generally recognized that ma- this comes over in any translation and hence chine translation with intensive human pre-and we derive the notion of translation as a scaled post-editing is hardly worthwhile since this process. We translate at various levels and in method is largely concerned with remedying the respect of various characteristics. An addition- defects of the machine. A far more satisfactory al limitation on the precision of translation is concept is that of companionship. An efficient provided by the peculiarities of the target lan- translating machine that can operate whenever guage which may contain no symbol for an idea required, can continue when its human partner in the base language, a frequent occurrence in is fatigued, can instruct its partner without the the case of exotic plants or animals, or no wearisome labor of consulting dictionaries and method of rendering an idea without adding an grammars, and can retire quietly into the back- inaccurate qualifier, as in Chinese-to-English ground when the human partner desires to exer- translation where the neutrality of the Chinese cise his powers unaided qualifies in considerable noun with respect to number cannot be preserved. measure as a good companion. The notion of level or mode of translation is After these preliminaries, we can proceed important. Machine translation has earned a directly to concrete problems. certain notoriety for its indulgence in very low- The following convention will be used. A term level translation and its fondness for what has in single quotes is used to represent the word in come to be known as mechanical pidgin. For the target language of which the quotation is a certain purposes, however, such as locating al- common meaning. lusions, low-level translation may be all that is For purposes of machine translation it is con- required. Confusion only occurs if the mode of venient to distinguish between the following translation is not made clear. operations: We are now in a position to discuss the notion
  2. P reprogramming 21 1. Transfer of meaning. when transferring an affix, or in Welsh, the body of the word after cutting off the mutable 2. Transfer of ambiguity. initials, an indication of the conjugation must be extracted from the mechanical word- 3. T ransfer of structure. dictionary. Then, when matching the detached component, the conjugation indicator must be 4. Injection when, for example, number is matched simultaneously. attached to a neutral Chinese noun. Thus Welsh nhroed will be decomposed into 5. Restraint, preventing the machine from nh (t declension) — no meaning excessive semantic analysis. The first stage in machine translation is cha- roed (t declension) — 'foot' racter recognition. There are three possible methods: The result of this operation is the sequence of equivalents dubbed mechanical pidgin. Matching against the mechanical word- 1. Complete human recognition in which a dictionary, however, cannot be confined to the reader deals with a familiar script. matching of single words. In most languages, 2. Incomplete human recognition in which irreducible compounds occur such as "cool off" certain visual characteristics of an un- which in contrast to "im-possible" cannot be known script are picked out. analyzed into semantic components. Such irre- 3. Photoelectric recognition, using standard ducible compounds must be entered as such in fonts. the mechanical dictionary. Then, when matching This stage is of very considerable importance a word which may be part of an irreducible as far as the economics of machine translation compound, it is necessary to extract both the is concerned, but is irrelevant to the subsequent meanings in isolation and the meaning in combi- operations and is therefore excluded from the nation . A second matching is then necessary preprogram. to ascertain whether the other component of the The outcome of recognition is the conversion potential compound is present. If this is not, the of the symbols of the base text into a functional compound can be erased. If the other member equivalent such as holes in punched cards or of the compound is present, it may be possible teleprinter tape. Having obtained a functiona- to accept the compound without further opera- lized text, the next stage is matching against a tion. In the Chinese sentence under considera- mechanical word-dictionary. This operation tion, the chances of encountering yung2-chieh3 h as been discussed in some detail by R.H. Richens and A.D. Booth1, and I shall only refer 'dissolve' in which the components retain their isolated meanings are relatively low. to essentials now. Each word of the base text It may be necessary, however, as in the case must be matched against the entire mechanical of German separable verbal prefixes, to defer dictionary, searching backwards. In some cases, a decision as to whether an irreducible com- a presorting of the base text into alphabetical pound is present until the syntax has been ana- order will expedite this operation. Then, as lyzed. soon as a dictionary word is encountered which Whenever a compound is accepted, the mean- is wholly contained in the base word, the equi- i ngs of the components in solution must be valent or equivalents in the target language erased. must be entered. Should there be a residue, i.e., Thus, to obtain an output in mechanical pidgin, if a base word is inflected, the residue must the mechanical dictionary must contain the words then be matched against the mechanical word- or parts of words of the base language, irredu- dictionary in its turn. In the Chinese sentence cible compounds, the equivalents in the target studied by the Group, affixes do not come into language, and indications of conjugation. In order the picture. to translate at a higher level, a more elaborate A point not sufficiently considered in the mechanical dictionary is required. earlier paper concerns languages such as Latin There are two types of information that we can with different conjugations and declensions or utilize at our next level, syntactical and seman- like Welsh with initial mutation. In this case, tic. In the sentence "the dog bites the cat", sub- ject and predicate are distinguished syntactically; 1. Machine Translation of Languages. New York in the sentence "this plant has yellow petals", 1955, p. 24. semantic analysis indicates a botanical rather
  3. 22 R- H. Ri ch en s
  4. P reprogramming 23 than engineering significance for "plant". Syn- 4. Pre- and post- insertion instructions. tactic information will be dealt with first since it appears to present rather less complex pro- 5. Word-class equivalent. blems than semantic information. The result of the matching procedure against the In order to analyze syntax, it is convenient to word-class sequence dictionary is to generate a allocate words to word classes. In some cases series of instructions and a new word-class se- these can be parts of speech or parts of speech quence. The latter then provides the basis fora delimited in various ways. Sometimes, in the new cycle of matching against the word-class Chinese chi2 'and', in which "reach" is an al- sequence dictionary. The whole procedure is re- ternative meaning, the word class will be the peated until a word-class sequence is generated sum of "and" and "verb". There is nothing that is wholly contained in the mechanical dic- against using different categories of word tionary. The operation is then concluded. classes for different pairs of languages, though The accumulated instructions can then be read a general unified scheme has some obvious ad- off, the rearrangements made, alternatives eli- vantages. It is useful to allocate some of the minated, and the necessary insertions made. In most frequent multipurpose words to one-member the Chinese sentence, three reductional cycles classes of their own. were involved. The procedure is illustrated in For utilizing syntactical information the me- Table I. The output reads "however the appear- chanical dictionary must contain expressions ance and degree of dissolv- ing of these two en- for the word class of each entry; this will take tities are somewhat un- alike". the form of a number or series of numbers for The information utilized so far has been syn- each word. When translating at this level, the tactical. The semantic information is more dif- preliminary matching process now results in ficult to process and what follows is merely ten- the output of a sequence of word class expres- tative. sions corresponding to the sequence of words in A possible method is to attach semantic indica- the base text. There are now various possibili- tors to significant words and to collect the indi- ties. Dr. Parker-Rhodes would use the word cators as one proceeds through a passage, using classes to provide material for a computing the totals to decide between alternative render- schedule based on a moderately restricted set ings of doubtful words. Thus "petal", "stem" of instructions. I take this as analogous to and "pineapple" could be accompanied by indica- learning a foreign language by means of a gram- tors for "botanical". This might help to limit mar. The method suggested here is more ana- "plant" to its botanical rather than its engineer- logous to learning one's native tongue, in which ing sense. As Dr. Thouless has pointed out, some correct usage is arrived at by imitation over a difficulty might be encountered with a "pineapple- long period with no conscious realization of rules. slicing plant", but in this case "slicing" might The mechanical dictionary in the present me- carry an indicator pointing the other way. I am thod must contain a supplementary dictionary of not in a position to say how useful this method word-class sequences. The sequence of word could be. It has the advantage of collecting in- classes for a single sentence is then treated as formation as the text is traversed. However, it a single compound or inflected word. This is is obviously an extremely crude way of mobili- decomposed into its constituents in the same zing semantic information and I should there- way as the individual words are decomposed into fore like to consider next a more difficult but stem and affix, that is by matching the initial more fundamental approach. component first and then proceeding to the next I refer now to the construction of an interlingua and so on to the end. It is possible that, in the in which all the structural peculiarities of the case of word-class sequences, the front may not base language are removed and we are left with be the best place to start, at least in some cases. what I shall call a "semantic net" of "naked This is a matter for further investigation. ideas". These bear some obvious resemblances The mechanical word-class sequence dictionary to the linguistic configurations discussed already. contains the following data under each entry: The elements represent things, qualities or relations. I associate adjectives (usually mona- 1. Word-class sequence. dic relations) and verbs (dyadic or higher rela- tions) in the Japanese way. 2. Rearrangement instructions. A bond points from a thing to its qualities or relations, or from a quality or relation to a 3. Alternative instructions. further qualification.
  5. R . H. Richens "black cat" is The semantic net thus represents what is in- variant during translation. It can, of course, be cat black transformed into a unique linear sequence for dictionary purposes, rather in the way that the structural formulae of organic compounds can be “The cat is on the mat" or given linear codes for purposes of cataloguing. "The mat is under the cat" is The problem of extracting semantic nets from 1 2 base texts is difficult and no general mechani- cat on mat cal procedure has yet been devised. One possi- bility is to regard the words of the base passage In asymmetrical relations, the bonds are not as pieces in a jigsaw puzzle. Each word has a interchangeable. number of semantic properties - differently shaped protuberances in the jigsaw analogy - "The dog bites the cat" can be represented as which fit in with some words but not with others. 1 2 1 2 dog part of teeth contact cat 1 2 Thus the relation “ see " can only attach much on the left-hand side to a human being or animal. Syntax already restricts the number of possible If a different category of bond is used for doubt- combinations; semantics limits the possibilities ful or uncertain connections, a method of pre- still further. cisely delimiting the field of ambiguity is avail- If syntax and semantics do not lead to a unique able. interlocking, we have an ambiguous situation. Constructions of the type dog part of teeth Ambiguity can be represented in a semantic net are not used since this would assume the possibi- by introducing a second category of bonds, and lity and desirability of weighting the terms of can presumably be transferred to the target dyadic relations in terms of "superiority" or passage if so required. "inferiority". The syntactical procedure discussed earlier in When the Chinese sentence studied by the this paper dealt with a specific pair of languages. Group is represented as a semantic net, the fig- It is more satisfactory theoretically to go through ure obtained is of considerable complexity. What an interlingua that is capable of expressing the is more, various deficiencies in the information nuances of all the languages considered in a provided by the sentence become apparent; for translation program and is more adequate for instance, no mention is made of the solvent, with- logical analysis than any existing language. out knowledge of which the significance of "solu- Such an interlingua would have the practical bility" is vacuous. • advantage of connecting such languages as Welsh This raises the question of "restraint". A and Japanese, where the labor of compiling a translator is frequently under the necessity of specific translation program would not be worth- reproducing ambiguities or inconsistencies in the while. It is well known that two-stage transla- base language by corresponding ambiguities or tion via an intermediary language is unsatisfac- inconsistencies in the target language. If a ma- tory; this is only so, however, when the interme- chine is to utilize semantic data, it must necessa- diary language is a natural rather than a uni- rily analyze the semantic relations of the passage versal language. fed into it. If this analysis is carried too far, the The semantic nets described above have an base passage is in danger of such severe mang- obvious bearing on the question of a universal ling that a readable output in the target language interlingua. If the elements (ideas) are re- will not be obtained. Thus in the example quoted, placed by letters with an ideographic significance a machine that indulges in semantic analysis only, we have in fact an ideographic algebraic will demand information on the solvent; if how- script with obvious potentialities for machine e ver, it is restrained to conform to the frailties translation work. The elaboration of a system of human nature, it should be possible to stop of ideographs for handling discourse is one of analysis at the level of the concept "solubility" the current research projects of the Cambridge and present the smooth inadequate output that a Group. human translator is expected to provide. It might In conclusion, I would like to return to the no- prove possible to arrange for a machine to trans- tion of translation as a scaled process in which a late at various levels of restraint so that the or- selection has to be made of the amount of infor- dinary person and the logician can each be satis- mation to be transferred. It is only a further fied.
  6. step to the notion of translation as a limiting to be of considerable future interest. Semantic case of abstracting. In ordinary academic life, nets have an obvious relevance in this connection. especially in science, abstracts are required This paper had, as its object, a brief descrip- far more frequently than full translations. In tion of some of the work being done by the the future, the increased rate of publication is Cambridge Language Research Group on machine likely to make the production of abstracts far translation. This work has now reached the more necessary. It therefore seems that any stage where one is beginning to dabble seriously procedure of selective transfer of ideas is likely in schemes for machine abstracting.
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2