Báo cáo khoa học: " A Type of Program for Mechanical Translation"
lượt xem 2
download
A program for the mechanical translation of a limited French vocabulary into English was constructed for operation on the computer APEXC. Its principal features were an improved routine for dictionary look-up, and an organization permitting systematic incorporation of additional subroutines.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: " A Type of Program for Mechanical Translation"
- [Mechanical Translation, vol.4, no.3, December 1957; pp. 54-58] A Type of Program for Mechanical Translation J. P. Cleave, University of Southampton, Southampton, England* A p rogram for the mechanical translation of a limited French vocabulary into Eng- l ish was constructed for operation on the computer APEXC. Its principal features w ere an improved routine for dictionary look-up, and an organization permitting s ystematic incorporation of additional subroutines. A program for syntactic p rocessing was constructed but was too large for the available storage space. I t examined preceding and following items — stems or endings — in order to c hoose correct equivalents, and used a dictionary of syntactic sequences or s tructures to effect local word-order change. APEXC A PEXC has one branch (jump) instruction d iscriminating between positive (or zero) and T he computer has a magnetic drum store negative. w ith 1024 locations arranged in 32 tracks each T he following abbreviations will be used: o f 32 locations. Each location contains 32 bits. A ny location can therefore be specified by an Ox operand address (X-address) of an a ddress of 10 bits. Both data and instructions instruction O. a re stored on the drum. Oy next instruction address (Y-address) of O. A n instruction consists of 32 binary digits and (Ox)ls least significant digit of Ox (i.e., s pecifies an operation (function), the 10 bit ad- digit 10). d ress of an operand contained in the store and (Oy)ms most significant digit of Oy (i.e., t he address (10 bits) of the next instruction, digit 11). w hich again is contained in one location in the (z) contents of the location whose address s tore. The arrangement of the digits of an in- is z. s truction is shown below (Fig. 1). Dictionary Subroutines The dictionary procedure is best explained by considering a simplified example with a diction- * T his paper is a report of work done in ary of 16 positive entries stored in increasing c ooperation with Dr. A. D. Booth and Mr. L. numerical order in locations 1, 2, 3, ... 16. B randwood at the Computational Laboratory, Suppose W is a word, known to be in the dic- Birkbeck College, London. tionary, whose address in the dictionary is required.
- Program for MT 55 Figure 2 The bracketing procedure1 requires us to start If we now examine the dictionary entry specified in the middle of the dictionary, either at 8 or 9. by Ox at the beginning of operation 4, it can S uppose 8 is chosen; the procedure for 9 is be seen that W is either in Ox or Ox + 1 . (If analogous (see Fig. 2). the initial location had been 9, the alternatives would be Ox and Ox - 1.) Hitherto, dictionary An "operation" consists of forming W-(y) by subroutines we have used counted the number of m eans of a subtraction instruction O. If the operations performed and at the final operation result is positive, a "probe-number" p is added tested Ox and its neighbor for identity with W. to Ox , if negative it is subtracted, p is then T his latter test had to be synthesized and so divided by 2. required several instructions. This disadvan- T he first operation is on (8) (i.e., O x = 8 ) tage can be eliminated if the final operation is s imilar to its predecessors. with p = 22 . After the operation Ox = 12 or 4 Suppose operation 4 is similar to 1, 2, 3. ( i. e . , Ox = 8 + 22 or 8 - 22 ), the new probe- At the conclusion of the third test p = 2-1 number is p = 2 1 . = 1/2. This is a '1' in (Oy)ms . The X- addresses formed are shown in Fig. 3. T he second operation gives a new probe- If the initial location is 9 and (Oy)ms prior number of 2 0 . The third test, therefore, t o operation 3 is '0', the correct address of W shows W to be in one of the 8 sets of 2 shown in the dictionary will be formed in Ox. But Oy.. in the diagram. i s the address of the next instruction to O in The fourth operation is slightly different from the dictionary routine and is altered by the ad- those preceding. It can be seen that operations dition of 2-1 to Ox to Oy' = Ov + 29, thus 1, 2, 3 each discriminate between two new ad- enabling a jump to occur at precisely the right dresses: the fourth discriminates between one moment in the sequence of operations. Oy' is new address and one that has been tested before. the address of the first instruction of the rou- tine following dictionary look-up. If the initial 1. Booth, A. D., "Use of a Computing Machine as a Mechanical Dictionary", Nature, vol. 176, Sept. 17th, 1955, p.565.
- 56 J. P. Cleave F igure 3 r -1 m = [1.1 + 2.2 + 4.3 + . . . + r2 +... l ocation is 8, W is located correctly only if n-1 + n)] /2n + (n2 ( O y )ms = 1 Here O y’ = O y - 29 = n - 1 + ( 1 + n)/2 n . T he efficacy of this method clearly depends T hus if n is large only one operation is saved; u pon the fact that (O x ) ls i s next to (Oy) ms t he extra programming required in a test for ( see Fig. 1). This convenient arrangement now z ero is therefore not worth-while with a com- e nables us to dispense with special arrange- p uter without this facility. m ents for the final operation, counting the num- b er of operations performed and special orders T he Basic MT Program f or jumping to the next sequence. The diction- A ll data to be "recognized" were, with a few a ry program now occupies only 11 locations: e xceptions, included in the main dictionary. i t was used in the MT program explained below. T he input routine compared sequences of sym- I f the W is not in the dictionary, then this b ols between "space" marks with the dictionary m ethod of dictionary look-up will select the e ntries. This routine therefore had only to rec- g reatest entry less than W. o gnize a "space" symbol on the input tape. All I t might be supposed that a further increase p unctuation marks, and the symbol for the end of speed could be obtained if during each of the o f text, were included as dictionary entries. a bove operations a test for zero is made ( i . e . , E ach dictionary entry D of the main- and identity between W and the dictionary entry). e nding-dictionaries was confined to one storage Suppose a dictionary of 2n e ntries. One dic- l ocation and had two equivalents. The second t ionary entry can be located during the 1st test, o f these, E 2 , was the target language equiva- 2 d uring the 2nd, 4 during the 3rd, . . . . 2 r-1 l ent of the dictionary entry. In general E 2 o c- c upied several locations. All "syntactical" d uring the r th , . . . ; 2 n -1 + 1 requires n tests. o perations were performed on the "first equiv- ( The extra 1 is an entry that cannot be located a lents, " E 1 , e ach of which occupied only one b y a zero test: in the examples of Fig. 2, s torage location. Each E 1 w as constructed e ither 1, or 16.) Assuming that each entry u niformly and consisted of three sets of ten i s equally likely to occur in a text, the average digits specifying addresses E1(l), E1 (2), E1(3). n umber of operations to locate a single word is ( See Fig. 4.)
- Program for MT 57 dress E1(1) = S, the address of the initial in- struction of a routine for processing the accu- mulated data in S. (Fig. 5 . ) E1(l) for an e nd-of-text symbol was ε , a stop order. A program for processing the first equiva- lents was constructed but was found to be too large for the available storage space and was abandoned. The plan of this routine, however, will be stated. The processing of S 1 c onsisted of carrying out in turn the operations whose first instruc- t ions were determined by the second address E 1(2) of each first equivalent in S 1. These operations — condition routines — had two functions. The first was to examine, where necessary, equivalents preceding and following to determine whether E1(3) specified the cor- rect second equivalent. The second function was to place a code number C corresponding to E in another series of locations S2. Convenient sub-sequences of the code numbers in S 2 were then compared to a "structure-dictionary." Recognition of these sub-sequences resulted in a rearrangement of the order of the recognized
- 58 J. P. Cleave C-sequence and the corresponding E1 -sequence. simplicity was achieved than if the foreign lan- The code-numbers were therefore assigned in guage words or target language words had been such a manner that the sequences requiring re- processed directly. arrangement could be recognized distinctly. Secondly, the distinct parts of the whole pro- Although in most cases this assignment coin- gram were isolated, the linkages being supplied cided with the usual classification of verb, pro- by the addresses in the first equivalents. Thus n oun, etc., there were some C which did not extra subroutines could be constructed and correspond to these categories. Thus donn linked to the program merely by altering ad- was entered in the main dictionary, with 'give' dresses in the relevant first equivalents. For as the target language equivalent. The condi- instance, if a more refined condition routine tion routine for this entry assigned a code num- was necessary for a certain set of first equiva- b er (verb 1 ) to it. erons w as an entry in the lents, this routine could be placed in the store verb-ending dictionary. The condition routine and the second addresses of the first equiva- determined by its first equivalent gave it a code lents altered to the address of the initial order number (verb2). The second equivalent of of the new routine. erons was 'will'. Thus when donnerons oc- The size of storage in the computer imposed curred in the input text, the first equivalents of severe limits on the extent and performance of donn and erons were placed in consecutive lo- the program. Thus very small dictionaries cations in S1. When the condition routines were were used, although best use was made of the operated, the code numbers (verb1) and (verb2) space available by means of stem-ending split- w ere placed in order in S 2 . Following these ting. Apart from these faults, there were two r outines the structure dictionary recognized inherent drawbacks of the above type of program. the sequence (verb 1) (verb2) as one requiring The use of separate condition routines em- transposition. The corresponding data in S 1 ploying a matching procedure to examine the were then transposed. Thus the final printing minor context of a first equivalent lead to an o peration printed the target language equiva- excessive program. A more economical ap- l ents of donn/erons i n reverse order to yield proach would be to calculate correct alterna- 'will give'. This procedure was used to per- tives from code numbers by some means. This form the pronoun-verb inversion. would greatly reduce the storage space as- The final stage of the program was a routine signed to this particular part of the program. for printing the second equivalents. In the pro- Secondly, the method of effecting change of gram which was put on APEXC the processing w ord order appears to be applicable only to of S1 was omitted so that the dictionary rou- subsections of languages where permutation of tines were immediately followed by the print target language order into foreign language routine. The print routine printed the contents order is purely local. Thus if a set of n con- of the addresses specified by the 3rd address secutive code numbers in S 2 was matched by o f the first equivalents in S 1. Each location the above method to a dictionary of structures, containing a second equivalent also contained the change of word order was confined to the an indication of whether the content of the next corresponding set of n first equivalents only. location was also to be printed. By this means This process was clearly incapable of dealing equivalents of any desired length could be directly with rearrangements of blocks of words. printed. A possible solution of the problem here would b e to use two structure-dictionaries, one for S ome Characteristics of the Program permuting elements within a block, another to This program had two important features. permute the blocks. The necessity of using a Firstly, all operations within the program structure-dictionary will disappear when a suit- were carried out on the first equivalents. As able technique of calculation (as opposed to these were uniformly constructed, a greater matching) has been discovered.
CÓ THỂ BẠN MUỐN DOWNLOAD
-
Báo cáo khoa học: Khả năng kháng bệnh bạc lá của các dòng lúa chỉ thị (tester) chứa đa gen kháng với một số chủng vi khuẩn Xanthomonas oryzae pv. oryzae gây bệnh bạc lá lúa phổ biến ở miền Bắc Việt Nam
7 p | 318 | 49
-
Báo cáo khoa học: THIẾT KẾ VECTOR MANG CẤU TRÚC GEN HA1 BIỂU HIỆN KHÁNG NGUYÊN BỀ MẶT VIRUS CÚM A/H5N1 Ở THỰC VẬT
8 p | 262 | 20
-
Báo cáo khoa học: " A novel approach to inhibit HIV-1 infection and enhance lysis of HIV by a targeted activator of complement"
4 p | 78 | 6
-
báo cáo khoa học: " Part I, Patient perspective: activating patients to engage their providers in the use of evidencebased medicine: a qualitative evaluation of the VA Project to Implement Diuretics (VAPID)"
11 p | 122 | 5
-
Báo cáo khoa học: "The complete genome of klassevirus – a novel picornavirus in pediatric stool"
9 p | 91 | 4
-
báo cáo khoa học: "Peritoneal mesothelioma in a woman who has survived for seven years: a case report"
4 p | 95 | 4
-
báo cáo khoa học: " Dental and craniofacial characteristics in a patient with Dubowitz syndrome: a case report"
5 p | 121 | 4
-
báo cáo khoa học: " Marketing depression care management to employers: design of a randomized controlled trial"
7 p | 106 | 4
-
báo cáo khoa học: " Looking inside the black box: a theory-based process evaluation alongside a randomised controlled trial of printed educational materials (the Ontario printed educational message, OPEM) to improve referral and prescribing practices in primary care in Ontario, Canada"
8 p | 128 | 4
-
báo cáo khoa học: " Testing a TheoRY-inspired MEssage ('TRY-ME'): a sub-trial within the Ontario Printed Educational Message (OPEM) trial"
8 p | 72 | 3
-
Báo cáo khoa học: " Klassevirus 1, a previously undescribed member of the family Picornaviridae, is globally widespread"
7 p | 59 | 3
-
báo cáo khoa học: " Which factors explain variation in intention to disclose a diagnosis of dementia? A theory-based survey of mental health professionals"
10 p | 86 | 3
-
Báo cáo khoa học: " Genetic predisposition to chikungunya – a blood group study in chikungunya affected families"
3 p | 60 | 3
-
Báo cáo khoa học: " Comparative analysis between a low pathogenic and a high pathogenic influenza H5 hemagglutinin in cell entry"
5 p | 70 | 3
-
Báo cáo khoa học: " The directionality of the nuclear transport of the influenza A genome is driven by selective exposure of nuclear localization sequences on nucleoprotein"
12 p | 64 | 3
-
Báo cáo khoa học: " In vitro host range, multiplication and virion forms of recombinant viruses obtained from co-infection in vitro with a vaccinia-vectored influenza vaccine and a naturally occurring cowpox virus isolate"
13 p | 66 | 3
-
báo cáo khoa học: " Transformational leadership, transnational culture and political competence in globalizing health care services: a case study of Jordan's King Hussein Cancer Center"
13 p | 123 | 3
-
Báo cáo khoa học: "Evolution of the M gene of the influenza A virus in different host species: large-scale sequence analysis"
13 p | 66 | 3
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn