Báo cáo khoa học: "The Thesaurus in Syntax and Semantics"

Chia sẻ: Nghetay_1 Nghetay_1 | Ngày: | Loại File: PDF | Số trang:9

lượt xem

Báo cáo khoa học: "The Thesaurus in Syntax and Semantics"

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

The recent work of the Unit has been primarily concerned with the employment of thesauri in machine translation. Limited success has been achieved, in punchedcard tests, in improving the idiomatic quality and so the intelligibility of an initially unsatisfactory translation, by word-for-word procedures, from Italian into English, by using a program which permitted selection of final equivalents from "heads" in Roget's Thesaurus, i.e. lists of synonyms, near-synonyms and associated words and phrases, instead of from previously determined lists of alternative translations....

Chủ đề:

Nội dung Text: Báo cáo khoa học: "The Thesaurus in Syntax and Semantics"

  1. [Mechanical Translation, vol.4, nos.1 and 2, November 1957; pp. 35-43] The Thesaurus in Syntax and Semantics† M. M. Masterman, Cambridge Language Research Unit, Cambridge, England The recent work of the Unit has been primarily concerned with the employment of thesauri in machine translation. Limited success has been achieved, in punched- card tests, in improving the idiomatic quality and so the intelligibility of an ini- tially unsatisfactory translation, by word-for-word procedures, from Italian into English, by using a program which permitted selection of final equivalents from " heads" in Roget's Thesaurus, i .e. lists of synonyms, near-synonyms and asso- ciated words and phrases, instead of from previously determined lists of alterna- tive translations. The Unit is investigating whether the syntactic properties of a word in a source language may be defined by a simple choice program, with ref- erence to extra-linguistic criteria, which might be of universal or extensive inter- l ingual application. It is hoped to combine or reconcile such a program with R.H. Richens's procedure for translating syntax by means of an interlingua, which has proved effective in a small-scale test. Studies have been made of the comple- m entary distribution in literary English of words and phrases from "heads" in Roget, and of the construction of discourse from the contents of selected "heads." The possibility of producing a thesaurus better suited for machine translation pur- poses than Roget's, to be based on a more restricted lexis and a simpler categor- ization, is to be examined. AT THE Second International Conference on One of the great problems of machine trans- Machine Translation, held at the Massachusetts lation is that of providing any device, program- Institute of Technology October 16-20, 1956, able on a machine, for translating idiomatic or members of the Cambridge Language Research metaphoric uses of word when these uses can- Group1 presented four papers2 which together not be foreseen, since they may be occurring opened up a new approach to certain linguistic for the first time in the language which is being problems of machine translation. As a result translated. To meet this problem, three of the of discussions which followed, a Research Unit Cambridge research workers, M.M.Masterman, was formed at Cambridge, with the support A .F.Parker-Rhodes and M.A.K.Halliday, rec- of the National Science Foundation of the ommended that a mechanizable procedure for United States, to investigate these problems producing non-literal, "idiomatic" translations f urther. 3 should be tried. This procedure required an † This paper has been written with the support 2. M.Masterman, "Potentialities of a Mechan- of the National Science Foundation, Washington, ical Thesaurus"; A.F. Parker-Rhodes, "An D.C. Algebraic Thesaurus"; R. H.Richens, "A Gen- eral Program for Mechanical Translation be- 1 . The Group is a private, informal research tween Any Two Languages via an Algebraic society, most of whose members hold appoint- Interlingua" (reported MT, Vol.3, No.2); ments in the University of Cambridge (see MT. M.A.K. Halliday, "The Linguistic Basis of a Vol. 3, No. 1, p. 4). The Unit, concerned spe- Mechanical Thesaurus", now published MT, cifically with machine translation and library Vol. 3, No. 3. retrieval methods, was formed mainly from members of the Group, with some additional 3. See Annual Report of the National Science workers. Foundation 1957 (in the press).
  2. 36 M. M. Masterman extra dictionary, compiled not on the principles as repute, 873: – 'of age,' 131: – 'of flock,' 648: of an alphabetic dictionary, but of a thesaurus,4 'of life,' 127: – 'painting,' 556, 559." There is to be inserted into the machine handling the only one context in common between the context target language. Thus, if the target language list of 'plant' and the context list of 'flowering,' were English, the main part of the procedure namely, 367, 'Vegetable.' We therefore correct- would consist in retranslating an initially un- l y assume that the synonym list under Vege- satisfactory translation, obtained by the word- t able is the synonym list required, if a syno- for-word procedures long known to be feasible n ym is in fact required for the basic word in machine translation, into idiomatic English. ' plant.' T he actual translation procedure, moreover, T he last stage in the procedure consists in did not consist, as had all mechanical transla- c omparing, in twos, the synonym lists which tion procedures up to that time, of program- have been selected by the procedure given ing the machine to make a selection between above in order to find which synonyms occur in the members of a finite set of antecedently giv- common in these. Thus, if 'Woman' and 'Animal' en translations of a source language word. It a re looked up in Roget's Thesaurus, and the consisted, on the contrary, of a procedure for s ynonym lists under each compared for com- mechanically producing from a thesaurus a fi- mon words, a single common word will be dis- nite set of extensive lists of synonyms of a par- covered, namely 'bitch.' These common words ticular word; that is, of a total dictionary in a re then ordered, in descending order of fre- miniature; and of then choosing, by a two-stage quency and the most frequent provide the re- p rocedure, firstly from among the lists, and t ranslation output, certain restrictive rules secondly from among the synonyms. Thus, by h aving been brought into play which are de- looking up the word 'plant,' say in the cross- signed to decide unambiguously which synonym r eference dictionary of a thesaurus, a set of shall replace each initially given pidgin English numbers can be obtained, each standing for a word. Sometimes, as in the case of 'plant,' in list of synonyms, which might appear in one ’flowering plant,' the output is the same as the context, of the word 'plant:' "plant as place, 184: initially given word; this is taken as confirma- as insert, 300: as vegetable, 367: as agricul- tion that the original translation was right. But ture, 371: as trick, 545: as tools, 633: as s ometimes, in the test cases presented at the property, 780: – 'a battery,' 716: – 'oneself,' Conference, the final output was significantly 184: – 'ation,' 184, 371, 780." This last re- different from the original word. Thus, by p resents an actual extract from the cross- using what came to be known as the "thesaurus r eference dictionary of Roget's Thesaurus. procedure," it was shown that the Italian phrase Initially, the machine cannot know which of a lcune essenze forestali e fruttiferi. w hich t hese lists of synonyms of 'plant' it should had been translated, by a word-for-word trans- choose. But suppose that the word 'plant' were lation procedure, 'forest and fruit-bearing es- preceded, in the text, by the word 'flowering.' sences,' could be retranslated 'forest and fruit- T he cross-reference dictionary entry for bearing examples [or specimens];' that the flowering' is as follows: "flower as essence, 5: I talian phrase tale problema si presenta par- as produce, 161: as vegetable, 367: as pros- ticolarmente interressante, which had been per, 734: as beauty, 845: as ornament, 847: translated, by the word-for-word procedure, "such problems self-present particularly inter- esting,' could be retranslated 'such problems strike one as, [or prove] particularly inter- 4. The only way of defining the notion of a the- esting;' and that the Italian word germogli, s aurus, in practice, is by reference to the which had been translated by the word-for-word famous work of Roget, Thesaurus of English procedure 'sprout,' could, though with difficulty, Words and Phrases (Longmans, Green and Co. be retranslated 'shoot.' The papers made clear 5. Locke and Booth, Machine Translation of that the use of such a thesaurus procedure by Languages (New York and London, 1955). See no means always produced a correct transla- esp. Chapter II; Richens and Booth, Some tion. For instance, the phrase particolarmente Methods and Mechanized Translation. interressante, which had been correctly trans- lated by the word-for-word procedure 'particu- 6. I.S.Mukhin, An Experiment in the Machine larly interesting,' was retranslated by the the- Translation of Languages Carried out on the saurus procedure as 'What's the matter?' Nev- B.E.S.M. (Moscow. 1956); examples: 'cate- ertheless, the examples showed that a trans- gory' (chart on p. 16); 'of' (chart on p. 17).
  3. Syntax and Semantics 37 in the machine translation field, computer re- lation device which was programable on an search is far in advance of language research; electronic digital computer, but which made t hat, if the linguistic problems can be solved use of the intrinsic elasticity of words, could by any mechanizable procedure, computer en- hope to deal, in a significant number of cases, gineers will find a way of programing the solu- with the hitherto unsolved problem of translat- tion on to a machine. At a speech made at the ing idiom, metaphor, and pun. Conference's final day, for instance, Dr. King The fourth paper presented at the Conference, s aid that procedures which had been brought by R. H. Richens, made a different, though forward at the Conference had convinced him cognate, recommendation. In it the author that a machine could translate not merely as recommended that a completely general inter- w ell as, but better than, an M.I.T. professor; lingual notation, or set of symbols, should be since, having more storage space, it could used to produce syntactically correct transla- produce a bigger vocabulary. Thus the papers tions between languages of different types, with- presented by the Cambridge research workers out any effort being made to translate directly at the Conference produced an atmosphere of between any given pair of languages. Richens technological hopefulness about the future pros- showed, moreover, that by the use of such an pects of mechanical translation, which did not, interlingua, and by a mechanical procedure so perhaps, take sufficient account of the fact that simple that it could be effected not only by a the basic linguistic problems, though tackled, digital computer, but by a punched card ma- were not yet solved. chine, a sentence could be translated with com- After the Conference, it rapidly became clear plete syntactical correctness from Japanese to us that the generality of approach implied by into the interlingua, and from the interlingua the proposal to use a target language Thesau- into English, German, Latin and Welsh. Thus rus was cognate to, but not identical with, the the Japanese passage conventionally translated generality implied by the proposal to use an as: KETSU SAKU HO GO HEI ni ICHI SAKU algebraic syntactic interlingua. The more re- to2 ri SHU SHI RYU SU2 ha KO HAI JI KI ni cent work of the members of the Unit has, there- y o tsu te I ru w as rendered into English as fore, been primarily directed towards making 'the percentage of matured capsules and the explicit the exact nature of the interrelations number of grains of seeds of one capsule are between these two proposals. For it is evident, different according to the time of hybridizing;' on the one hand, that an interlingual claim is into German as der Prozentsatz der gereif- being made by the assertion that Language is ten Kapseln und die Zahl der Grane der Samen such that, in it, metaphors and proverbs can, einer Kapseln sind gemäss der Zeit des Bastar- in some cases, be interchanged by means of a d ierens verschieden; i nto Latin as ratio per thesaurus. And, on the other hand, the analytic centum capsulas maturandi et numerus grano- examination of Richens' interlingual algebra rum seminum capsulae unius secundum tempo- has established that it, itself, when interpreted, rem hybridizandi diversa sunt; and into Welsh showed some, though not all the characteristics as y mae canran oeddfedu masglau a rhif gro- of a thesaurus. The question therefore arose: nynnau hadau un masgl yn wahanol yn ol amser could the two methods be unified? Could an croesi rhywiau. And Richens' claim, made in interlingual thesaurus somehow be conjoined to his paper, that his interlingua was algebraic an interlingual syntactic notation to produce has since been justified. When subjected to completely interlingual idiomatic mechanical mathematical logical analysis, the Richens translation from any language into any other? interlingual notation was shown to possess the Conversely, could syntactical correctness as characteristics of a weak mathematical system. well as semantic elegance be introduced into the translation program at the stage of target- I t might be thought that such revolutionary language retranslation by including a syntactic t ranslation proposals as these, requiring as section within a thesaurus, so as to produce they do such an immense amount of computer idiomatic multilingual mechanical translation storage, would be of merely academic interest from any source language into a single target to machine translators until computer research language ? h ad developed to a point considerably in ad- vance of that at which it now is. This is by no means the case, however. Information pre- 7. King and Wieselman, Stochastic Methods sented at the same conference, notably in a of Machine Translation (International Telemeter paper by Dr. Gilbert King, 7 made it clear that Corporation, 1956).
  4. 38 M. M. Masterman Halliday's sophistication of the Richens inter- Up to this point, the nature of the mechanical lingual syntax translation program was of the translation technique had required that the ma- following general form. For the general de- jor part of the Cambridge Unit's analytic work scription of it I quote his own words:8 should be performed by programmers and " .. Translation.. is a form of comparative mathematical logicians, not by linguists; for descriptive linguistics; but whereas translation the Unit's first need was to produce an analysis between a given pair of languages requires only of the translation process which was both suf- particular (one language) and comparative (in ficiently general to justify the commercial pro- this case transfer, i.e. two languages) descrip- duction of a future mechanical translator, and tion, we envisage it as a requirement of me- also mathematically definite enough to be mech- chanical translation that the program should be anizable. Now, however, it became clear that applicable to translation among all languages, essential and fundamental considerations, re- and therefore we must face the necessity of garding both the nature of comparative descrip- universal (all languages) description ... Clearly tive linguistics, and the nature of philosophic if work was concentrated on a one-one trans- logic, were tied up in all this analytic work. lation field, where only a straight transfer de- For, to mention only one such consideration, s cription is required, results might be ex- the promoters of the thesaurus target-language pected much more quickly. But the whole pro- procedure could, and on occasion did, claim gram might have to be remade for each pair of that they were mathematicizing Plato; Richens, languages, and [so] it seems preferable to aim with an equal justice, could be said to be math- at a universal linguistic translation program ematicizing Aristotle. Thus, with sophistica- applicable to translation between any pair of tions on both sides, the age-old controversy in languages. philosophy between nominalists and realists "This wider aim can only be achieved by a took, in the research conferences of the Cam- rigorous separation of the particular from the bridge Language Research Unit, a strange, comparative universal range of validity (in MT fascinating, esoteric new turn. terminology, of monolingual from interlingual Secondly, it became clear that if a well- features), and by their separate handling in the grounded decision was to be made between the program ... The basic problem in the grammar policy of interlingualizing the thesaurus, (that is the setting up of relations among the partic- is, of assimilating semantics to syntax) and ular grammatical structures of different lan- that of thesaurizing the syntax (that is, of in- guages ... It seems clear that considerable use cluding syntax within semantics) the linguists can be made, both in the dictionary entry and would have to be called in. In fact, for a time, in the operations, of the descriptive distinction they would have to be given charge. In the at- between those chunks [separable segments of tempt to decide between these two alternatives, words9] which can be fully identified in the the Unit had developed two complementary grammatical analysis ( i. e . grammatical chunks lines of research. In the first, Richens de- or 'operators') and those only partially identi- signed an interlingual program complete with fied in the grammar and requiring further, dictionary for translating syntax, beginning lexical, information (i.e. lexical chunks or with translation from Italian into English, but 'arguments'). This is of course an arbitrary subject to continual test by translation from distinction made for mechanical translation other languages. In this test the object was to purposes; it reflects the different fields of ap- see how, with a very rough-and-ready method plication of the grammar and the dictionary in of translating metaphor and idiom, but with a very advanced and sophisticated method of translating syntax, intelligible translations of 8. From "The Linguistic Basis of Mechanical scientific texts could be made without using a Translation" (Report for the Eighth Interna- thesaurus. In the second line of research, tional Congress of Linguists, University of transformations were made from thesaurus- Oslo, 1957; in the press). heads to texts and then back again within one 9. See Richens and Halliday, "Word Decompo- language, without any procedure being used to sition for Machine Translation;" presented to translate from one language to another, or to the Georgetown University Eighth Round Table translate syntax. The linguists were then in- Meeting on Linguistics and Language Studies, vited to comment on and improve both of these April, 1957, and to appear in its Proceedings lines, in order to see whether or not they tend- (in the press). ed to contrast or converge.
  5. S yntax and Semantics 39 "The question is inapplicable"). For instance, descriptive linguistics ... Comparative linguis- take the French operator la, the function of tics has the theoretical equipment [for estab- which, for mechanical translation purposes, is lishing a universal description of syntax] by always very difficult to define, since, speaking reference to categories of context grammar; vaguely, it can serve either as a feminine def- and the systems of context-grammar categories inite article or as a feminine accusative pro- set up for mechanical translation make up a noun. We assume that la has already been grammatical interlingua such that any single monolingually placed within a set of monolin- language is capable of comparison with them. gual grammatical systems, including a two- This grammatical interlingua .. is not a uni- gender system, which apply to French only. versal language, which would merely turn We therefore feel free to ask, interlingually, the number of languages we have to deal with not "Does la belong to any gender system?" from n to n + 1, but a set of systems of because it is notorious that gender systems, grammatical relations identified in context as between languages, do not correspond, but, grammar, of the type that one sets up for the far more simply, "Can la , under any circum- comparative identification of grammatical cate- stances, tell us anything about sex?" Thus, by gories in descriptive linguistics .. The method this change of question, we are exchanging a [of setting these systems up ] which seems at r eference to the intra-linguistic context, (i.e., present likely to be most fruitful, and [which] t hat of French) for the far more stable extra- is being tried out on a limited number of lan- linguistic context, i.e., that of the division of guages, (Italian, Chinese, English, Russian the human race into two sexes. English has no and Malay in the first instance ), is [first] to genders, French two, German three, Icelandic establish a rigid operator/argument distinction, six; but Englishmen, Frenchmen, Germans and [then] to identify the operators by their and Icelanders alike all fall into communities placing in a number (provisionally about 60) of consisting of two, and only two, sexes. Thus, two term systems each term being a yes-or-no with regard to the French operator la , when function, . . The arguments are then classified we ask, "Can it, ever, tell us anything about by reference to grouping of these systems .." sex?" we can instantly and unhesitatingly an- H alliday's method, then, stripped to its es- swer, "Yes, it does." Proceeding to the next sentials, is first to make a monolingual gram- question, we ask, "Does la apply to animate/ mar of each language, and then, distinct from inanimate objects?" to which the answer is, this, an interlingual analysis. The monolingual "It applies to both." To the next question, grammar is of the kind normally produced by "Does la apply to present/non-present time?" descriptive linguists, except that it is only for the answer is, "Neither; the question is inap- the operators of each language; it is by refer- plicable." "Does la r efer to proximate/distant ence to these operators that the arguments are, regions of space?" Answer, "Neither; the later, to be defined. This monolingual gram- question is inapplicable. " (With regard to the m ar can, at a later stage, be mathematically French operator là this question could be an- related to the interlingual analysis of these swered; but not with regard to la), and so on. same operators, but is initially sharply to be The heart of the whole method lies in the appli- contrasted with it, since it is to be based on cation of the precise and elegant methods used e xtra-linguistic, not on intra-linguistic con- text.10 The interlingual analysis, the making by contemporary descriptive linguistics to ana- lyze monolingual context grammar (methods of which is the key to the whole problem, is which amount in effect to analyzing the older achieved by the following method. With regard compendium units "verb," "adjective," "noun" to each operator in question, the analyst asks and the rest into weaker but more stably defin- himself a number of extremely simple questions, able unitary components from which any re- questions so simple, in fact, that he can unhes- quired variant of the compendium units can be itatingly answer, with regard to them, "Yes," built up) to analysis of extra-linguistic context "No," "Both," "Neither" ("Neither" meaning also (Halliday; June, 1957). In this latter case the extra-linguistic contexts can be universal ones, and the compendium units are the actual operators themselves. In other words, by tak- 10. M.A.K.Halliday, "Some Aspects of Sys- ing seriously the analogy which has always been tematic Description and Comparison in Gram- known to exist to some extent between intra- matical Analysis" (Studies in Linguistic Anal- linguistic and extra-linguistic context, and by ysis; Philological Society Special Volume, London, 1957).
  6. 40 M. M. Masterman simple and elegant translation program of treating the first as a straight extension of the Richens will really improve the quality of the second, Halliday has shown that he can achieve, translation produced by it. A test is being for practical purposes, a non-contentious devised of the capacities of the original and method of universal grammatical description. amended versions to translate prepositional (By 'non-contentious' I here mean only, 'a phrases. Meanwhile, another feature has method which will produce the same answers emerged, in that Halliday's amendments to to the same questions when applied to the same Richens' program have strengthened the case operators by different analysts.') Moreover, for coding this program to go through the com- the preliminary use of this method gives some puter by using the very general mathematical provisional reason to think that the more com- system known as lattice theory. (The use of plete and comprehensive the series of "Yes/No" lattice theory for the analysis of language will questions which are asked (however large it is, effect an analysis congruent to the ideas of the list will be objectively determinable and those linguists who can, in any sustained way, finite) the more closely the numbers of opera- imagine language as a net. On a first approxi- tors in each language come to approximate to mation, a lattice is an asymmetric net; a finite o ne another. The result, if it is confirmed, lattice is a fishing net or hammock, though an will be very useful for mechanical translation, asymmetric one; that is, a net with a single s ince it means that, with regard to any lan- top point and bottom point. Such nets are built guage, the operator category will be checked up from a single asymmetric binary relation, and redefined by the interlingual analytic which itself derives, though over some distance p rocess itself. of time, from the asymmetric binary relation Thus Halliday's suggestion for sophisticating used by George Boole, and which was suggested Richens' translation program is already of con- to him by the linguistic adjective-noun relation.) s iderable research interest, since it shows Preliminary grounds for using this mathemati- that even so initially general and purely logical cal system to algorithmize the translation of a research project such as that of Richens can syntax had already been given in earlier papers be re-envisaged as arising out of a valid lin- by the members of the Unit. 11 Moreover, the guistic field. Halliday's suggestion is also fact that the Richens interlingua had already hopeful in that preliminary research trials been shown to constitute an algebraic system show that it does provide a paradigm, or model, weaker than lattice theory, though not incon- for the rapid construction of operator diction- g ruent with it, increased the ground for re- aries. Thus the Unit has plans to prepare such mathematicizing it by trying on it a mathemati- dictionaries in Italian, Standard Chinese, Can- cal system of the same kind as itself, though tonese, Malay, Hindi, Russian, Turkish, Eng- of more algorithmic power. And Halliday's lish, French, and German, these being the lan- analysis, being as it is in terms of dichotomies, guages for which the dictionary makers are (and of systems which can be constructed by readily available. If the method justifies itself, successions of dichotomies) straightforwardly other languages, without too much strain, can uses lattice theory by its very nature. Either, be added to these. The second consideration therefore, it must be compressed and coded by which can be derived from studying Halliday's initially using this system, or it cannot be com- schema is that he is, in effect, making a syn- pressed and coded at all. Some idea can be tactical thesaurus. Several of the yes-no ques- gathered, however, of the extent of the com- tions by which he establishes the components plication which Halliday's suggestion introduces of his categories, for instance, "Does this into Richens' program from the fact that where- operator apply to animate/inanimate objects?" a s an entry of 20 bits (20 binary digits) per "Does this operator assert a fact / give an im- chunk would have sufficed Richens to translate plication?" "Does this operator indicate com- both meaning and syntax, Halliday's amend- pletion/non-completion?" "Does this operator ment will require an entry of at least 120 bits indicate duration/non-duration?" could equally well be used as part of a schema for classify- ing synonyms under given thesaurus-heads. Thus a convergence between the interlingual 11. See MT, Vol. 3, No. 1, pp. 2-28 (report on and thesaurus approaches is detectable here. the Colloquium of the C. L. R. Group, August, What is not yet established, as must be made 1955); and M. Masterman, "The Comparative clear, is whether the additional complexity Analysis of a Chinese Sentence, " (annex to the which Halliday desires to insert into the very report, available from the Editor of MT).
  7. Syntax and Semantics 41 per chunk for syntax translation alone. For- A2, Instantiation in Fact B, Psychical Re- tunately, Dr. Gilbert King, who was mentioned search C and Science D, then the paragraph earlier, and who now is a member of the Unit's constructed by Dr. Thouless can be thesaurized Consultative Committee, considers it feasible, as follows: from the engineering point of view, to construct " ' Interest' [Al] in 'psychical research' [C] a m echanical translator which will perform is often 'motivated' [Al] by 'wonder' [A2] at l attice operations but not arithmetical ones, 'phenomena [C] which 'appear to be' [B] 'mar- and which will allow of chunk entries 1, 000 vellous' [A2]. The 'sitter' [C] is 'amazed'[A2] bits long.12 For existing computers, however, at the 'wonderful' [A2] 'results' [D and B] of Halliday's schema would be too complex by far. ' card-guessing experiments' [C] which 'leave This should not blind us to its intrinsic interest him in a state of' [B] 'bewilderment' [A2], or to its many potential advantages; but it 'seeming' [B], as they do, 'to savour of' [B] should be borne in mind by those linguists who 'necromancy' [A2]. This 'attitude' [Al] of are seriously interested in developing machine 'awe' [A2] (or of 'admiration' [A2], as it translation as a concrete reminder that, for would earlier 'have been called' [B]) 'produces' every increase in linguistic analytic complexity, [B] a 'fascination' [A2] with the 'subject' [C a heavy electronic price has to be paid. and D]. The 'new-comer's' [C] 'surprise' [A2] 'leads' [B] often to 'stupefaction' [A2], and the 'research' [D] is 'treated' [D] as a 'sensa- Turning now from syntax without semantics tion' [A2] rather than as a 'serious' [Al] to semantics without syntax, a word must be ' branch of science' [C and D]." said about the Unit's second research project, Other paragraphs, giving the obituary of an namely that of examining the interrelations be- imaginary well-known biologist, an advertise- tween texts and their constituent thesaurus- ment for a film star, and a denunciation of the heads without the complicating intervention of B ritish Conservative Party, were similarly a foreign language. Dr. E. W. Bastin, Karen constructed. The introduction of a randomizing Jones, M. M. Masterman, R.H.Needham, A.F. procedure, with the object of mechanizing the Parker-Rhodes, A.R.Penny, Dr. R.H.Thouless selection of synonyms, caused a paragraph of and W.F. Woolner-Bird have made the princi- esoteric theology, and also one denouncing pal contributions. philosophic scepticism, to be a little more ir- The first provisional discovery made by the rational than they would otherwise have been, members of this research group was that para- but not very much. Attempts rapidly followed graphs of lecture-style discourse could, with- to use this method to construct parody ( Thou- out difficulty, be constructed by the intuitive less and Parker-Rhodes); to simulate essay use of a minimum number of thesaurus-heads. writing (Woolner-Bird); and to employ it to Thus a paragraph dilating pompously but not analyze chapters instead of paragraphs (Need- vacuously on the present peculiar scientific ham and Jones). Several facts of considerable position of the study of parapsychology was interest emerged. One was that, in any kind of constructed by Dr. Thouless and Margaret writing which builds up into an argument, the- M asterman, for thesaurus demonstration pur- saurus-heads tend to be introduced in powers poses, using only four lists of thesaurus syno- of two, each topic being introduced concurrently nyms to supply all the argument words. These with that to which it primarily contrasts. An- lists concerned the generic ideas of 'Wonder' other was that the introduction of a new thesau- (with a cross reference to 'Interest'), 'Science,' rus topic, in discursive writing, tends to follow 'Parapsychology,' and of a very general topic a clustering of re-allusions to a single one of within which 'Appearance in Thought' contrasted the topics which have been introduced earlier, with 'Instantiation in Reality,' the two com- and which are themselves synonymous, in such bined heads forming an antithetic pair. The a way as to force the selection of the new the- method by which the paragraph was constructed saurus-head. This result was reached inde- was suggested by one of the Unit's program- pendently by Woolner-Bird and by Needham and mers, Lady Hoskyns. If Interest be Al, Wonder Jones (by analysis of Southern, Cultural As- pects of European Territorial Expansion.) A third fact which emerged was that, if the unit 12. G.King, The Requirements of Lexical to be analyzed consisted of a chapter, rather Storage (International Telemeter Corporation, t han a paragraph (that is, of a piece of dis- 1957). course with an order of, say, 20 enlarged
  8. 42 M. M. Masterman the members of the Unit will have to face t hesaurus-heads), a sub-class of these heads, squarely if they are to construct a full-scale say, 2 or 4, will have vastly more synonyms translation thesaurus. The creative ability of of themselves occurring in the chapter than man is not so easily amenable to mechanization, will any of the others; so that this sub-class in this field, as the Unit's early, gaily-reached of heads, taken in a prescribed ordering, can results, would seem to imply. In other words, be taken as a title for the whole chapter. A with every text we analyze it becomes increas- f ourth fact, of very general interest, was that ingly evident that every discursive writer con- there are some thesaurus-heads which always structs his own thesaurus. How then is the have to be constructed to analyze discourse; Unit to construct a thesaurus which has any that is, which occur so constantly that it seems hope of applying to more than one text? almost impossible to think without them. One One immediate reply to this capital difficulty of these conveys the very idea of a synonym: is by asking another question: "How, equally, " is, constitutes, appears to be, seems to be does any linguist compile a dictionary which equatable with, shows itself to be, constitutes fully applies to more than one text?" In a the fact that; namely, that is, in other words; paper on categorization of lexis, recently read could be called, could be treated as, could be to a meeting of the Language Research Group considered as; this comes to saying, this at Cambridge, R. A.Crossland suggested that a comes to the same thing as saying. . " These procedure of selection out of a thesaurus-head, and their like appear in every text; (including alternative or preferably supplementary to any the present report). So do synonyms of the procedure based on contextual distribution, very general generic idea of causation: might be based on the traditional dictionary- " causes, promotes, produces, leads to, de- maker's technique of classifying words as ap- termines, results in; the result is, the upshot propriate to particular general contexts or is, in the end, we find that we can say that.. " types of diction. 13 Such indication is given So do synonyms for the very basic idea of ap- only sporadically and somewhat unsystemati- pearing to be one thing, while turning out in cally in most existing dictionaries, but, with fact to be another. (This generic idea precedes refinement, it might provide a technique for nearly every introduction of contrast.) Since programing the computer to make an appro- these thesaurus topics so constantly occur, it priate choice from among the possible alter- might be argued that their constituent synonyms natives in a thesaurus-head, especially when were functioning as a queerly determined class this is to be used in the final stage of transla- o f syntactical operators, rather than as argu- tion. Two methods of providing this selection ments. Moreover, since, in order to analyze suggest themselves. Either information about the chapter of a book into its constituent the appurtenance of a word in a source language thesaurus-heads, a distinction has to be estab- to different dictions ("high" or "low" style, the lished, and in a non-contentious manner, be- styles of various technologies, etc. 14 ), is re- tween new ideas (formalized by P), qualifiers, corded and passed through the interlingual stage, to be taken as a single element with what they though the computer in that stage translates qualify (formalized by Q's) and re-allusions to just an approximate lexical equivalent (the key ideas previously mentioned (formalized by R's); word of a thesaurus-head, perhaps). Or else, and as all these have to be distinguished from without the recording and transmission of such O 's, or operators, it becomes clear that if information, an appropriate equivalent, out of Halliday, to translate syntax, has to construct a head "labelled" according to the appurtenance a new type of universalized thesaurus, so also of its constituent elements to different dictions, the thesaurus makers, in order to analyze the would be selected in accordance with general semantic patterns occurring in texts, have to construct a very basic, simple kind of syntax. All of which gives reason to hope that in some w ay (the members of the Unit do not yet see 13. Diction s eems now to be virtually a syn- h ow) the interlingual program for translating onym in philological discussion for "verbal or syntax, and the analytic program for construct- written style" (cf. Oxford English Dictionary). ing texts from thesaurus-heads, or thesaurus- 14. Crossland noted the element of subjectivity heads from texts, may all turn out to be differ- involved in categorization not based on detailed ent parts of the same program, in the end. analysis of contextual distribution within re- In conclusion, a final word must be added on stricted textual material. one problem of thesaurus construction which
  9. S yntax and Semantics 43 and immediate context, (either by the procedure in the sense that it follows from this, that some- described earlier, or by some other mechaniz- how or other, human beings do succeed, in dis- able procedure to be substituted for it), within cursive argument, in communicating to one an- t he set of such heads constituting the "rough other the boundaries of their respective the- output." sauri; for if they did not, there would be no argument. We know this; for when communi- If any of these suggestions proves fruitful, it cation fails to take place, we say, "I cannot would seem likely, on the face of it, that new u nderstand the writer; he is too allusive." thesauri will have to be prepared, or existing W hat we say, in making such a comment, is ones reorganized by "labelling" of items and the opposite of what we actually mean; be- no doubt by addition, deletion and rearrange- c ause what we mean is that such a writer ment, for languages between which translation does not take the trouble to order and display is envisaged. Also it might be useful to pre- the re-allusions to his main ideas sufficiently pare thesauri on the basis of particular scien- f or us to "catch" his personal procedure of tific or other specialized "dictions." These synonym creation; that is, sufficiently for us could be considered valid in practice for fairly to ascertain his thesaurus. And when we say extensive categories of writers, though in prin- this, it is further intuitively clear that we must ciple the argument that every writer has his be referring to some objective communication- own thesaurus, based on what he alone desires promoting procedure; some procedure which t o write or has written, seems reasonable we use, without being aware that we use it, when- enough. ever we argue discursively with one another. Whether the Cambridge Research Unit will really succeed in compiling such a gigantic, universally valid, thesaurus of thesauri is not The task that confronts us, then, though for- yet clear. What is clear, in the sense that it midable, is not hopeless. Objective synonym- is becoming established as a thesis supported creating procedures which can be employed, by considerable factual evidence, is that when can also be discovered; and logicians, diction- a human being thinks discursively he does use ary makers and descriptive linguists are just a thesaurus. Secondly, it is intuitively clear, the men to discover them.



Đồng bộ tài khoản