Báo cáo khoa học: "A Programming Language for Mechanical Translation"
lượt xem 2
download
A notational system for use in writing translation routines and related programs is described. The system is specially designed to be convenient for the linguist so that he can do his own programming. Programs in this notation can be converted into computer programs automatically by the computer.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: "A Programming Language for Mechanical Translation"
- [Mechanical Translation, vol.5, no.1, July 1958; pp. 25-41] A Programming Language for Mechanical Translation† Victor H. Yngve, Massachusetts Institute of Technology, Cambridge, Massachusetts A notational system for use in writing translation routines and related programs is d escribed. The system is specially designed to be convenient for the linguist so that he can do his own programming. Programs in this notation can be converted into computer programs automatically by the computer. This article presents com- p lete instructions for using the notation and includes some illustrative programs. rather simple procedure can be an exacting task IT HAS BEEN SAID that the automatic digital r equiring a high degree of skill on the part computer can do anything with symbols that we o f the programmer. can tell it in detail how to do. If we are inter- ested in telling a digital computer to translate I t has been the custom for the linguist who texts from one language into another language, wanted to try out a certain approach to mechan- we are faced with two tasks. We first have to i cal translation to ask an expert programmer f ind out in detail how to translate a text from t o program his material rather than to learn one language to another. Then we have to "tell" t he art of programming himself. Besides the the computer how to do it. This paper is con- usual inconveniences and difficulties attending cerned with the second task. We will present t he communication between experts in two h ere a specially devised language in which the s eparate fields, this practice has certain more l inguist can conveniently "tell" the computer b asic difficulties: Neither the linguist nor the to do things that he wants it to do. programmer has been able to be fully effective. T he linguist has not become aware of the full T he automatic digital computer has been de- p ower of the machine, and the programmer, s igned to handle mathematical problems. It is n ot being a linguist, has not been able to use a ble to carry out complicated routines in his special knowledge of the machine with full t erms of a few different kinds of elementary effectiveness on linguistic problems. o perations such as adding two numbers, sub- The solution offered here to these difficulties tracting a number from another number, mov- i s an automatic programming system. The i ng a number from one location to another, tak- l inguist writes the results of his research in a i ng its next instruction from one of two places notation or language called COMIT, which has depending on whether a given number is negative b een specially devised to fill his needs. The o r positive, and so on. In order to instruct the p rogrammer writes a conversion program or c omputer to carry out complicated routines, compiler capable of converting anything written s imple instructions for the elementary opera- in this notation into a program that can be run t ions are combined into a program. The writ- on the computer.* Thus the expense, time, and ing of a program to carry out even an apparently effort needed to separately program each lin- g uistic approach is saved, and, even more im- p ortant, the linguist is given direct access to the machine. He becomes more fully aware of † This work was supported in part by the U. S. i ts potentialities, and his research is greatly A rmy (Signal Corps), the U. S. Air Force facilitated. ( Office of Scientific Research, Air Research and Development Command), and the U.S.Navy ( Office of Naval Research); and in part by the * This is being done by the programming re- National Science Foundation. search staff of the M.I. T. Computation Center.
- 26 V. H. Yngve What COMIT Is which governs the flow of control or the order in which the rules of the program are carried COMIT is an automatic programming system out. f or an electronic digital computer that provides the linguist with a simple language in which he c an express the results of his researches and in which he can direct the computer to analyze, s ynthesize, or translate sentences. It is cap- a ble of being programmed on any general pur- pose computer having enough storage and appro- priate input and output equipment. The language h as been devised to meet the needs of the lin- g uist who wants to work in the fields of syntax a nd mechanical translation. Some of the lin- g uistic devices and operations that COMIT has b een designed to express are: immediate con- s tituent structure, discontinuous constituents, F ig. 1. How a COMIT program works in the coordination, subordination, transformations computer. a nd rearrangements, change in the number of s entences or clauses in translation, agreement, T he way in which COMIT rules are written, g overnment, selectional restrictions, recur- h ow they direct the computer to perform the s ive rules, etc. desired operations, and how they are assembled into programs will now be described. The re- A p rogram written in COMIT consists of a mainder of the paper is thus a complete manual n umber of rules written in a special notation. o f detailed instructions for using this special- T he computer executes these rules one at a purpose programming language. t ime in a predetermined order. In seeking an appropriate notation in which to write the rules, C OMIT Rules and Their Interpretation w e were guided by several considerations. 1. That the rules be convenient for the linguist A rule in COMIT has five sections, the name, - c ompact, easy to use, and easy to think in t he left half, the right half, the routing, and t erms of. the go-to, each with its special functions. Fig- 2. T hat the rules be flexible and powerful — u re 2 shows how a rule is divided into these t hat they not only reflect the current linguistic v iews on what grammar rules are, but also that they be easily adaptable to other linguistic views, A linguist can use the computer in the follow- i ng simple way. He expresses the results of h is linguistic research in COMIT. He tran- Fig. 2. The five sections of a rule in COMIT. s cribes his rules onto punched cards using a f ive sections. The name and left half are sepa- device with a typewriter keyboard. He supplies r ated by a space, the left half and the right half t ext or special instructions to the machine also a re separated by an equal sign, the right half on punched cards. He then gives these packs of a nd the routing are separated by two fraction c ards to an operator and subsequently receives b ars, and the routing and the go-to are sepa- h is results in the form of printed sheets from r ated by a space; the machine. — flow of control — The way that a COMIT program works in the We will discuss first the function of the name computer is shown in figure 1. The rules mak- and the go-to, which have to do with the flow of ing up the COMIT program can be thought of as c ontrol f rom one rule to another. A program s tored in the computer at A. Material to be w ritten in COMIT always starts with the first t ranslated or otherwise operated on enters the r ule in sequence. After a rule has been car- c omputer under the control of the rules from r ied out, the computer obtains in the go-to the t he input B. It is operated on by the rules and n ame of the next rule to be carried out. The t ranslated in the workspace C. It then goes to n ame o f each rule is to be found in the left- t he output E. The dispatcher D contains spe- hand part of the name section of that rule. (The c ial information, stored there by the rules,
- A Programming Language 27 right-hand part of the name section is reserved t he name section is read "this rule", an * in f or the subrule name, to be discussed later.) t he go-to is read "the next rule, " and the rule In addition there are three cases when control is followed by a period to make a sentence. is automatically transferred to the next rule in These conventions are enough to read the pro- sequence regardless of its name. One of these gram in figure 3. These and the other conven- w ill be immediately clear; the other two will tions are conveniently tabulated in a later sec- be clarified in the explanations of the left half tion. According to the conventions, the pro- and the routing. The three are: (1) an asterisk g ram in figure 3 should be read: i s written in the go-to, (2) the constituents I n/the rule A/... /then go to/the rule C/. w ritten in the left half of the rule were not I n/the rule B/... /then go to/the next rule/. found in the workspace, (3) an *R in the rout- I n/the rule C/... /then go to/the next rule/. ing finds no more material at the input. A rule I n/this rule /... /then go to/the rule B/. to which control is always transferred automat- I n/the rule D/... /then go to/the next rule/. ically in this fashion so that a rule name is not needed, may have an asterisk in the name sec- The dispatcher also can influence the flow of tion in place of a rule name. When this auto- control in the following way: A rule in COMIT matic transfer of control takes place from the m ay have several subrules. I n figure 4, the l ast rule in sequence so that there is no next r ule B has four subrules. The rule name is r ule, the COMIT program stops. F igure 3 shows an example of how control p roceeds from one rule to another under the d irection of the rule name and the go-to sec- t ions. In this program, rule A would be the f irst one executed, then C, then the rule with a n asterisk in the name section, then B, then C , then *, then back to B again, and so on r ound and round in what is known as a loop, u ntil one of the conditions occurs in the rule F ig. 4. A COMIT program to illustrate a rule m arked asterisk that will automatically trans- w ith subrules. The rule B has four f er control to the next rule D. After D has s ubrules. b een executed, the program will stop. in the left hand part of the name section of the first subrule. The name of each subrule is in the right hand part of the name section of that subrule. A rule that does not have several sub- rules may be thought of as a rule with just one subrule. A rule with only one subrule does not h ave a subrule name. When control is trans- f erred to a rule with several subrules, the dis- F ig. 3. A COMIT program to illustrate the patcher is consulted for an indication of which f low of control under the direction of s ubrule is to be carried out. For this purpose t he rule name and the go-to sections t he dispatcher contains dispatcher entries. A o f the rules. d ispatcher entry of the form B E would cause the computer to execute the subrule E in rule B As an aid to the memory, we will give a way e ach time it comes to that rule. If there is no i n which each part of a rule in COMIT can be e ntry in the dispatcher for this particular rule, read in English. This will be done by providing o r if there is an entry, but it contains more E nglish equivalents for all abbreviations used t han one subrule name, the choice is made at in COMIT, and by providing certain convention- random. In other words, if the dispatcher con- al wordings that will always be used between the t ains the entry B E G, the computer will choose v arious sections and between the various ab- at random between the two alternative subrules b reviations. For the parts of the rule already E a nd G. A dispatcher entry having a minus discussed we need the following conventions: A s ign in front of its values (subrule names) has r ule is preceded by the word "in", rule names the same meaning as it would have if it had all a re preceded by the words "the rule", the go-to its possible values except those following the i s preceded by the words "then go to", an * in m inus sign. A dispatcher entry with a rule
- 28 V. H. Yngve name but no values has the same meaning as script AFF/having/the value EN/ , followed one with all possible values, that is, choose by/a constituent consisting of/the symbol NOUN/ completely at random. The contents of the dis- with/the numerical subscript/4/ , and with/the patcher are not altered by any of these proces- subscript GENDER/having/the value FEM/." ses. How the contents of the dispatcher may The conventional wordings and the readings for b e altered will be discussed in the section on the abbreviations used may be found tabulated the routing. near the end of this article. T he English reading of a rule with several subrules is the same as that for a rule with one subrule except that the words "consult the dis- patcher and select" are read following the rule n ame. In figure 4, the rule B with four sub- Fig. 5. Example of how linguistic material r ules is read: may be represented in the workspace. In/the rule B/consult the dispatcher and select/ the subrule D/. . . /then go to/the rule H/. - l eft half - t he subrule E/... /then go to/the rule H/. Having discussed the name and go-to sections t he subrule F/... /then go to/the rule I/, and shown how material is represented in the t he subrule G/... /then go to/the rule I/. workspace, we are now ready to discuss the re- maining three sections of a rule. First we will — workspace — take up the left half. A rule with several sub- Having discussed the flow of control, we will rules may have no more than one left half. It turn to the workspace and describe how text to is written in the first subrule. The function of be translated or other material to be worked on the left half is to indicate to the computer which i s represented there. This will prepare us for constituents in the workspace are to be operated a d iscussion of the remaining three parts of on by the rest of the rule. The constituents in t he rule whose function it is to operate on the the workspace to be operated on are indicated material in the workspace. b y writing constituents in the left half that match them in certain definite respects. M aterial is stored in the workspace as a series of constituents separated by plus signs. A match condition between a constituent in the A constituent consists either of a symbol alone workspace and a constituent written in the left or a symbol and one or more subscripts. The half will be recognized if the following condi- symbol is written first. It may be the textual tions hold: (1) The symbols are identical. (2) m aterial itself, a word, phrase, or part of a If the constituent in the left half has any sub- word; or it may be any temporary word or ab- scripts written on it, the constituent in the work- breviation that the linguist finds convenient to space must also have at least subscripts with the use. Subscripts are of two kinds, logical sub- indicated subscript names — the order of writ- scripts and numerical subscripts. Logical sub- ing the subscripts has no significance. (3) If scripts are potential dispatcher entries and thus the logical subscripts in the left half have any have the form of a rule name (subscript name) values indicated, the subscripts in the workspace followed by one or more subrule names (values). must also have at least these values — again the N umerical subscripts are used for numbering order is unimportant. (4) If a numerical sub- and counting purposes. They consist of a period script is written in the left half, the numerical f or the subscript name followed by an integer subscript in the workspace must have an identi- n in the range 0 ≤ n < 215 . A constituent may cal numerical value, but if . G or . L is written have any number of logical subscripts, but only in the left half before the value of a numerical one numerical subscript. subscript, a numerical subscript in the work- An example of how linguistic material can be space will be matched if it has, respectively, a represented in the workspace is given in figure value greater than or less than the value writ- ten in the left half. 5 . This could be read in English as follows: "a constituent consisting of/the symbol IN/ Dollar signs written in the left half have spe- with/the numerical subscript/1/ , followed by/ cial meanings. $1 may be written in the left a c onstituent consisting of/the symbol DER/ half to match any arbitrary symbol. If the $1 with/the numerical subscript/2/ , followed by/ is followed by subscripts, they are matched in a constituent consisting of/the symbol ADJ/with/ the normal fashion. A dollar sign followed by the numerical subscript/3/ , and with/the sub- any number greater than 1 ($4) will match the
- A Programming Language 29 and in the same order as those written in the indicated number of constituents. It cannot have left half. subscripts. A dollar sign without a number can be written as a constituent in the left half I f an indefinite dollar sign is the first con- and can match any number of constituents in the stituent in the left half, it will match all of the workspace, including none. This is called an constituents in the workspace to the left of any indefinite dollar sign, while those with numbers constituent that is matched by the second con- are called definite dollar signs. stituent in the left half. If the indefinite dollar sign is the last constituent in the left half, it will match all of the constituents in the workspace to the right of any constituent that is matched by t he next to the last constituent in the left half. If there are two or more indefinite dollar signs written in the same left half, they must be sep- arated by constituents that are not dollar signs, or by $1 with subscripts, in order to prevent an ambiguity as to which constituents in the work- space are to be found by the several indefinite Fig. 6. Examples of match and no-match con- dollar signs. ditions. The top lines in a) and b) re- I f an indefinite dollar sign has constituents present constituents in the workspace. written on each side of it in the left half, the T he bottom lines represent constitu- computer will first try to match all constituents ents as written in the left half. to the left of the indefinite dollar sign. It does not have to search again for the constituents to As an example of how constituents written in the left of the dollar sign unless a number (as the left half can match constituents found in the will be explained shortly) referring to a constit- workspace, figure 6 a shows several of the pos- uent to the left of the indefinite dollar sign is sibilities. Each constituent in the second line written to the right of the indefinite dollar sign. represents a constituent as it might be written In this case, the computer will search for a new in the left half. It matches the workspace con- match for constituents to the left of the indefinite stituent written directly above it in the first line. dollar sign if it fails to find a match with the con- In figure 6 b, none of the constituents meet the stituents to the right of the indefinite dollar sign. match conditions. Constituents in the left half are conceived of T he computer carries out a search f or a as being numbered starting with one on the left. match condition between each of the constituents The leftmost constituent is called the number written in the left half and corresponding con- one constituent in the left half. When the con- stituents in the workspace in the following way: stituents written in the left half have been suc- The first constituent on the left in the left half cessfully matched with constituents in the work- is compared in turn with each constituent in the space, the constituents in the workspace that workspace starting from the left until a match have been found are temporarily numbered by is found. The computer then attempts to match the computer in the same way as the constitu- the next constituent in the left half with the next ents in the left half. The constituent in the work- c onstituent in the workspace and so on until space found by the number one constituent in the e ither all constituents written in the left half left half thus becomes the number one constitu- h ave been matched, or one constituent fails to ent in the workspace. The temporary number- match. In this case, the computer starts again ing of constituents in the workspace remains un- w ith the first constituent in the left half and til it is altered by the right half or until the rule s earches for another match in the workspace. has been completely executed. Its purpose is to Finally, either a match is found for all of the allow expressions in the left half, right half and constituents and the computer goes on to execute routing to refer to constituents in the workspace the rest of the rule, or the computer cannot find by their temporary number. t he indicated structure in the workspace, in which case control is automatically transferred T he various steps in a search are indicated t o the next rule. It can be seen that a struc- i n the example given in figure 7. The lower t ure will be found in the workspace only if it two lines give the constituents as they are writ- has matching constituents that are consecutive t en in the left half of a rule, and the way in
- 30 V. H. Yngve and eighth constituents in the workspace become r espectively the number one, two, three, four, and five constituents in the workspace. Note that two or more constituents in the workspace may be given one number if they are referred to by a dollar sign in the left half. I t is possible for the left half to be modified t o some extent by what is found in the work- space . This can be done by writing a number as a constituent in the left half. The number then refers to the constituent already found in the workspace that has been given that number. T he rest of the left half is then executed as if F ig. 7. Example of the search steps that the the constituent referred to in the workspace had computer goes through in order to find been written originally in the left half in place i n the workspace (top line) the struc- of the number. A number written in the left t ure written in the left half of the half can only refer to a constituent in the work- r ule (next to bottom line). space that has already been found by a constitu- ent to the left of it in the left half. It can refer which the computer numbers these constituents. only to a single constituent, one matched by $1 The top line indicates the current contents of the for example. A number written in the left half w orkspace. Lines a) through e) represent the cannot have subscripts written on it. way in which the computer temporarily numbers the constituents in the workspace that have been s uccessfully matched at each step of the search. T he first step is indicated in line a): an at- tempted match between the number one constit- u ent in the left half and the first constituent on the left in the workspace fails. In line b), the number one constituent matches the second con- s tituent in the workspace, but an attempted m atch between the number two constituent in the left half and the third constituent in the work- s pace fails. In line c), the number one constit- Fig. 8. Example of use of a number in the left u ent in the left half matches the third constitu- half (bottom two lines). Attempted ent in the workspace, and the number two the match indicated at a) fails, but the one f ourth, but since the number three constituent at b) is successful. The contents of is an indefinite dollar sign and can match any t he workspace are represented on the number of constituents including none, the next top line. c onstituent, number four is matched with the fifth in the workspace. The match fails. Hav- F igure 8 gives an example of the use of a ing already matched the constituents in the left number in the left half. After two unsuccessful h alf to the left of the indefinite dollar sign, the matches, the number one constituent in the left computer now tries to match the constituents to half finds the third constituent in the workspace. the right of the indefinite dollar sign. In line d), The number two constituent in the left half is it finds a match of the number four constituent then considered to be replaced by this constitu- with the sixth, but the number five constituent ent that has just been found (C/S). The match i n the left half fails to match the seventh con- then fails because the fourth constituent in the s tituent in the workspace. The computer then w orkspace does not have at least the subscript tries again with the number four constituent, S, required for a match condition. But when the and in e) finds a match between the number four n umber one constituent in the left half finally and number five constituents in the left half and finds the sixth constituent in the workspace, the the seventh and eighth constituents in the work- number two constituent in the left half is con- space. Since all of the constituents in the left sidered to be replaced by this constituent (C), half have now been found in the workspace, the a nd the next match is successful because this constituents in the workspace that have been C will, according to the conditions for a match, found are left with the numbers as shown in line find the C/S that is next in the workspace. e). T he third, fourth, fifth and sixth, seventh,
- A Programming Language 31 T he English reading of the left half is the same as the reading of the material in the work- s pace except that it starts with ", search for a m atch in the workspace for", ends with ",and if not found, go to the next rule, but if found ", and includes conventional wordings for several abbreviations including the dollar signs and the numbers. For example, A/.G3 + $1 + $ + $2 + 2 in the left half would be read: ", search for a Fig. 9. Example of the combining of subscripts match in the workspace for /a constituent con- by dispatcher logic. a) shows the num- s isting of /the symbol A/with/the numerical ber two constituent in the workspace, s ubscript/greater than/3/, followed by/a con- b) shows the entry in the right half, c) stituent consisting of/any symbol/, followed by shows the resulting number two con- /a constituent consisting of/any number of con- stituent in the workspace. stituents/, followed by/a constituent consisting A l ogical subscript written in the right half o f/two constituents/, followed by/a constitu- with *C in place of its values complements the ent consisting of/the number two constituent in v alues of the subscript found in the workspace, the workspace /, and if not found, go to the next t hat is, all the values that it has are replaced rule, but if found". b y just those values that it doesn't have. In - r ig h t h a lf - other words, *C effectively adds a minus sign T he function of the right half i s to indicate in front of the subscript values. In the case of h ow the structures found in the workspace by n umerical s ubscripts, the new value replaces, t he left half are to be altered. If there is no i ncreases, or decreases the old depending on right half, the structures found in the workspace w hether the value written in the right half fol- a re left unaltered. l ows the period immediately or with an inter- vening I or D. Since numbers are treated mod- R earrangement o f the constituents found by ulo 215, 1 added to 215 - 1 will give 0, and 1 the left half and temporarily numbered will take s ubtracted from 0 will give 2 15 - l. Subscripts place when the appropriate numbers are written will be deleted from a constituent when they are i n the right half in the desired new order. If preceded by minus signs in the right half. A a ny of the numbers referring to constituents in dollar sign preceded by a minus sign will cause t he workspace are not written, these constitu- all subscripts on that constituent to be deleted. e nts will be deleted. The single digit zero as S ubscripts are added, altered, or deleted in t he only constituent in the right half will cause t he order from left to right in which they are e verything found by the left half to be deleted. w ritten in the right half. The same subscript T he single digit zero is never entered in the w ill be altered several times if several expres- workspace. sions involving it are written in the right half. New constituents will be inserted in any de- The computer will carry over subscripts from s ired place in the workspace when they are a ny single numbered constituent in the work- written complete with symbol and any desired space to any other single numbered constituent s ubscripts and values in the desired place in indicated by the right half. For this purpose a the right half. subscript name in the right half is followed by T he computer will add or alter subscripts a n asterisk and a number indicating the number w hen they are written on a constituent or num- of the constituent from which the subscript is b er in the right half. If this constituent already t o be carried over. Carried over subscripts h as a logical subscript with the same subscript g o onto the new constituent in the order from n ame as the one that is being added, the two l eft to right in which they are written in the s ubscripts are combined in a special way called right half. Logical subscripts go onto the new d ispatcher logic. I f there is no overlap in c onstituent with dispatcher logic. Numerical v alues, that is, if the two subscripts do not have s ubscripts carried over either replace, in- a ny values in common, the old subscript is re- c rease, or decrease the old value depending on placed by the new one. But if the two subscripts w hether . or .I. or .D. precedes the asterisk. have any values in common, only the values that A d ollar sign preceding the asterisk will cause are common to the two will be retained. An ex- a ll the subscripts from the indicated constitu- a mple is shown in figure 9. e nt to be carried over.
- 32 V. H. Yngve A fter all of the operations indicated by the i s executed by the computer, these entries are right half have been carried out on the constitu- sent to the dispatcher where they combine with e nts in the workspace, the numbered constit- the entries there according to dispatcher logic. uents remaining in the workspace and any new Logical subscripts on a constituent in the work- ones that have been added are given new tempo- s pace may also be sent to the dispatcher as dis- r ary numbers by the computer in the order in p atcher entries. Conversely, dispatcher en- w hich they are represented in the right half. t ries may be carried over as subscripts onto a These new temporary numbers will be of use c onstituent in the workspace. This latter, to when the routing is executed. r eturn to the right half for a moment, is done b y using the normal notation for carrying over s ubscripts but by using the letter D to refer to the dispatcher. 1 /CASE*D written in the right half would cause the CASE dispatcher entry to b e carried over and added to the number one c onstituent in the workspace as a subscript. 2/$*D written in the right half would cause all o f the dispatcher entries to be carried over as s ubscripts onto the number two constituent in the workspace. If the constituent in the work- s pace already has subscripts of the same kind, F ig. 10. An example of some right-half opera- the dispatcher entries are combined with them t ions, a) the numbered constituents according to dispatcher logic. in the workspace initially, b) the right *D followed by a number in the routing section h alf, c) the numbered constituents in will cause all of the subscripts on the indicated t he workspace finally, and after re- n umbered constituent in the workspace to be numbering. s ent to the dispatcher as dispatcher entries w here they combine with any entries already A n example of some of the operations indi- there according to dispatcher logic. When the c ated by a right half is given in figure 10. c omputer executes a rule, subscripts designated I n this example, the number one constituent in in the routing section of the rule and dispatcher the workspace is deleted. The number two con- entries written directly in the routing section of s tituent has its numerical subscript increased t he rule are sent to the dispatcher in the order b y the numerical subscript carried over from in which they are written from left to right in the number one constituent, and then decreased t he routing section. This is done after the left by 3 to give 8 ( 7 + 4 - 3 = 8). The B subscript and the right halves are executed and before the i s carried over from the number one constitu- g o-to is executed. When subscripts are sent to e nt, the D subscript, not being mentioned, re- t he dispatcher from the workspace, they are mains unaltered. The E subscript is added not deleted from the workspace; when they are f rom the right half. The F subscript has its sent to the workspace from the dispatcher, they values complemented. (We assume that its pos- a re not deleted from the the dispatcher. s ible values are Q, R, S, and T.) The G sub- s cript is deleted. Finally, a new constituent is COMIT has a special provision for rapid dic- added to the workspace and the constituents in tionary search. Dictionary entries may be writ- t he workspace are renumbered. t en in a list w hich will be automatically alpha- The English reading of the right half involves b etized by the computer. This list may be en- o nly a few new wordings for abbreviations. t ered from one or more rules called look-up T hese will be found in the section on English rules. A look-up rule has two special features: reading. * L in the routing section of a look-up rule, fol- l owed by one or more numbers referring to — routing — c onsecutively numbered constituents in the The function of the routing section of the rule w orkspace, serves to indicate what structure is to alter the contents of the dispatcher, con- i n the workspace is to be looked up in a list. trol input and output functions, direct the com- T he name of a list, written in the go-to section p uter to search a list, and add or remove plus o f the look-up rule, serves to indicate what list s igns in the workspace. the structure is to be looked up in. A list can- Dispatcher entries may be written in the rout- n ot be entered by an automatic transfer of con- ing section. When the routing part of the rule t rol to the next rule.
- A Programming Language 33 found, the symbols of the constituents between When entering a list, the computer tempo- t he spaces are formed into one long symbol rarily deletes all subscripts from the constitu- which is looked up in list B. If it is not found ents in the workspace indicated by the *L, and i n the list, control goes to the rule after the all plus signs between the constituents, thus list and then to G. forming one long symbol. It is this long sym- bol that is looked up in the list. In addition to the look-up rule with its *L ab- T he list itself has the following structure: b reviation, there are two other ways of altering T he entries are separate rules. The first rule the number of plus signs in the workspace. of a list has a hyphen followed by the name of *K followed by one or more numbers referring t he list in its name section. The rest of the t o consecutively numbered constituents in the l ist rules have nothing in their name sections. workspace will cause the symbols of these con- List rules have only one subrule each. The long s tituents to be compressed into one long sym- symbol formed by a look-up rule is looked up in bol, and any subscripts that they may have had t he left halves of the list rules. Each left half will be lost. thus contains only one constituent with a symbol * E f ollowed by one or more numbers referring only and no subscripts. Each list rule may also t o consecutively numbered constituents in the have a right half, routing, and go-to. If the long workspace will cause the symbols of these con- s ymbol is found in the list, the corresponding stituents to be expanded by the addition of plus right half is executed in normal fashion. If the s igns so that each character becomes a sep- n umber one is written in the right half of the arate constituent. A list of characters is given l ist rule, the long symbol remains in the work- i n the center column of figure 12. Any sub- s pace. If the single number zero is written in scripts that the original constituents may have t he right half, the structure indicated by the had will be lost. l ook-up rule is deleted. If nothing is written Only one of the abbreviations *L, *K, or *E i n the right half of the list rule, the items tem- m ay be used in any one rule, and when it is p orarily deleted by the look-up rule are re- u sed, it must be last in the routing section to s tored and the workspace remains unaltered. If avoid confusion in the numbering of the constit- the long symbol is not found in the list, the items uents in the workspace. t emporarily deleted by the look-up rule are re- The COMIT program communicates with the s tored, leaving the workspace unaltered, and outside world through input and output functions c ontrol is automatically transferred to the first under control of abbreviations in the routing r ule after the list. section. Reading of input material and writing of output material can be done in any one of several channels and in any one of several for- mats as follows. C hannels. T he particular computer that COMIT is being programmed for (IBM 704) has a n umber of magnetic tape units connected to i t as well as a card reader and punch and a p rinter. Magnetic tapes may be prepared for t he computer from information on punched cards, and material written on tape by the com- p uter may later be read off on a printer or punched on cards. Each input or output abbre- v iation designates that reading or writing is to F ig. 11. Example of a list rule with look-up rule t ake place in channel A, B, C, or one of the a nd two rules to take care of failure to o thers. Then, before the program is run on f ind the indicated structure. t he computer, the operator connects the chan- n els used by the programmer to various mag- A n example of a list is given in figure 11. n etic tape units, printers, etc. Any channel R ule A is the look-up rule. It serves to find may be connected to any one of several input a ny number of constituents between spaces in o r output devices. This gives the maximum t he workspace. (Spaces are indicated in the of flexibility of operation, and allows the out- workspace by hyphens.) If the workspace does put of one COMIT program to become the input n ot have two spaces, the left half is not found o f another no matter what channels are desig- a nd control is transferred to the next rule and nated for input and output in the two programs. t hen goes to C. If the indicated structure is
- 34 V. H. Yngve The abbreviations *RW in the routing section m ore than 59 characters will end after the followed by a channel designation will rewind n ext space, fraction bar, or comma, or before the tape unit connected to that channel. t he next plus sign, or after 72 characters, O ne channel, channel M, i s reserved for w hichever comes first. Lines are thus usually m onitoring purposes and cannot be rewound. e nded at a natural break. I t can only be written on. The COMIT pro- F ormat A i s for text, and involves only ma- g rammer can write on this channel any infor- t erial written in the symbol sections of constit- mation that may be of use to him later concern- uents . When material is transmitted between i ng the correct or incorrect operation of his the workspace and the input or output channels p rogram. Certain information is also written u nder the direction of an abbreviation in the on this channel automatically if the machine dis- r outing calling for format A, a special trans- c overs certain mistakes in the program during l iteration t akes place. The purpose of this operation. t ransliteration is to allow all of the characters Material may be read or written in any one of available on the input and output devices to be s everal formats . Format S (specifiers) in- u sed in the text. Since many of the available v olves whole constituents, including symbols characters have special meanings in the rule — a nd subscripts. Format A is for text, and in- t he plus sign separates constituents, the frac- v olves only symbols. Both format S and for- t ion bar separates symbol from subscripts, and m at A are designed for the particular charac- s o on — these must be represented in a differ- t ers available on the printers and card punches ent manner when they are written in the symbol i n current use. Other formats may be made p art of a rule if ambiguities are to be eliminated. available if and when other types of input or out- A ccordingly, format A uses the transliteration put equipment become available. s cheme presented in figure 12. W hen material is punched on cards for read- ing into the computer in format S, it is punched i n exactly the way that it is to appear in the w orkspace, including symbols, subscripts, and plus signs between constituents. Any number o f characters up to a maximum of 72 may be p unched on a card. When material extends over onto another card, the break between cards c an be made at any point where a space is al- lowed, or anywhere in the middle of a symbol. W hen the computer executes a rule with an a bbreviation in the routing section that calls f or reading i n format S from a designated channel, the next constituent from the input is brought into the workspace where it replaces the designated numbered constituent. For ex- a mple, *RSA2 would cause the computer to r ead in format S the next constituent from channel A and send it to the workspace where it will replace the number two constituent. W hen the computer executes a rule with an a bbreviation in the routing section that calls f or writing i n format S, the designated num- b ered constituents in the workspace are writ- Fig. 12. Format A transliteration table. When ten in the designated channel. They are not de- t he text characters of column one are leted from the workspace by this process. For r ead in by an *RA abbreviation, they example, *WSM3 5 would cause the computer a ppear in the workspace as in column t o write in format S in channel M the number two. When the characters of column t hree and the number five constituents from two are written out by an *WA abbrev- the workspace. iation, they appear in the output as in column three. T he computer will start a new line or card N ote that the characters available for use in e ach time it executes an abbreviation calling s ymbols consist of the letters, period, comma, f or writing in format S. Each line requiring
- A Programming Language 35 and hyphen, and an asterisk followed by any The input and output abbreviations used in the character but space. routing section of a rule start with an asterisk f ollowed by R or W for read or write, then The first column of figure 12 lists all of the there follows a letter designating format A or characters available on the printer and card S, then a letter designating a channel, usually punch. The second column shows how these A, B, or C (or M in the case of a write abbre- characters appear in the workspace after they viation only) and finally one number in the case have been brought in by an input operation cal- of a read abbreviation and one or more num- ling for format A. Note that the letters, period bers in the case of a write abbreviation desig- and comma are brought in unchanged, the space nating the numbered constituents in the work- becomes a hyphen in the workspace, and all space that are involved. Examples have been other input characters are prefixed by an aster- given in previous paragraphs. isk in the workspace. The end of line symbol *. is brought in after the last non-space character Summary on the card. T he second column also lists all possible This notational system is convenient and well characters that can be written unambiguously adapted to a large class of problems including in symbols in a rule. Some of the characters language translation and formal algebraic ma- are single and some are double, consisting of nipulation. The computer automatically con- a n asterisk followed by another character. verts programs in this notation into actual com- (An *E expand abbreviation written in the p uter programs. Programs are written in the routing does not insert a plus sign between the notation as a series of rules, each of which may a sterisk and the other character of a double have five parts, the name, the left half, the character.) right half, the routing, and the go-to. The third column of figure 12 shows how the An arbitrary rule name may be written in the characters of the second column will be printed name section of each rule. In the go-to is writ- after a write abbreviation calling for format A ten the name of the next rule to be executed. has been executed. The hyphen is written as a The material to be operated on exists in the s pace, *. is interpreted as end of line, or car- c omputer as a series of constituents in the r iage return, all other characters are un- workspace. The function of the left half is to changed except that the asterisk is removed indicate which constituents are to be operated from the double characters. Since the printer on by the computer. This is done by writing can print a maximum of 120 characters in a in the left half only enough about the constitu- line, the computer will automatically end a line ents or their context to uniquely identify them. after 120 characters have been written if the *. In this way, the same rule can be made to apply abbreviation has not ended it sooner. i n a variety of situations that are the same in W hen the computer executes a rule with an certain respects. There is a convenient way of a bbreviation in the routing section that calls locating two or more constituents in the work- f or reading in format A from a designated s pace that match each other in a certain way c hannel, the next character is brought in from without having to know what the way is in which t he input, transliterated, and entered into the they match. w orkspace in place of the designated constitu- I f the constituents indicated in the left half ent. For example, *RAB2 would cause the cannot be found in the workspace, control goes c omputer to read in format A the next charac- to the next rule instead of to the rule mentioned ter from channel B and send it to the workspace i n the go-to. This is one type of program where it will replace the number two constituent. branch. W hen the computer executes a rule with an T he function of the right half is to indicate a bbreviation in the routing section that calls w hat operations are to be performed on the for writing in format A, the symbols from the constituents found by the left half. It is possible designated numbered constituents in the work- to add, delete, and rearrange constituents. It space are assembled into a long symbol, trans- i s also possible to add subscripts to any con- literated, and written in the designated channel. s tituents, and to rearrange, delete, and calcu- For example, *WAM1 2 4 would cause the com- l ate with them. There are two kinds of sub- p uter to write in format A in channel M the scripts, numerical subscripts that can be used s ymbols from the number one, two, and four for counting and simple arithmetic operations, constituents in the workspace. The workspace and logical subscripts that can conveniently be remains unchanged in this process. used for logical calculations. Both types of
- 36 V. H. Yngve s ubscripts may be used in the left half to help indicate the material to be operated on. They can thus enter into the condition for a program b ranch. Logical subscripts can in addition be s ent to the dispatcher where, as dispatcher e ntries, they become effective in controlling n-way program branches. Each dispatcher en- t ry controls which of several subrules is to be c arried out in a given rule. A t hird type of program branch is provided b y the facility for looking up material from the w orkspace in a list expressed as a series of l ist rules. This facility can be used for dic- tionaries. The computer will automatically al- p habetize all material in lists to facilitate the look-up operation. The function of the routing section is to con- trol input and output operations, to control flow of information to and from the dispatcher, to c ontrol list look-up operations, and to bring several constituents together into one constitu- e nt, or separate a constituent into several con- s tituents, one for each character. Input and output facilities provide the max- imum of convenience for the user. In addition, the system has a number of checks built in that will help the programmer find any mistakes he may make in writing his program. H ow to Read a Rule in COMIT T he purpose of this section is to present a s ummary of the various conventions used for r eading a rule of COMIT in English. The r eadings are, of course, purely mnemonic, for they cannot describe completely what the com- p uter does when it executes the rule. T he various abbreviations used in a rule are tabulated in figure 13. Some abbreviations have several different English readings depend- ing on what part of the rule they are in. When t his is the case, a note has been inserted in the table to give an indication of the contexts in w hich the abbreviation should be given the v arious readings. In addition to the English readings associated w ith the abbreviations, there are conventional w ordings that are not associated with any par- t icular abbreviations, but instead with certain p ositions in the various sections and parts of the rule. In order to summarize these conven- t ional wordings, figure 14 presents a sample r ule and its complete reading. The wordings t hat are associated with the format are pro- v ided with an explanatory note giving the cir- Fig. 13. Abbreviations used in COMIT and c umstances under which they are used. t heir English readings.
- A Programming Language 37 Fig. 14. Conventional wordings that are associated with the format of a rule. The left hand column names the various sections and parts of the sample rule with which the word- i ngs of the last column are associated.
- 38 V. H. Yngve H ow to Write a Rule in COMIT All subrules of a rule with more than one sub- rule have a subrule name. The subrule name is T he purpose of this section is to present the s eparated from the rule name by one or more conventions that must be adhered to when writ- s paces, otherwise it starts in any column after i ng a COMIT rule. t he first. A rule can have a maximum of 36 G eneral: The left hand 72 columns of the s ubrules. If there are several rules with the punched card are available for writing COMIT s ame rule name, they must have identical sets r ules. The other 8 columns can be used for o f subrule names. n umbering the cards if so desired. If a rule The first rule of a list has a hyphen in column requires more than 72 columns to write, a hy- one followed by the list name. The rest of the phen may be used at the end of one card and the rules in a list have nothing in the name section. rule continued on the next card in any column. A n ame consists of 12 or fewer consecutive T o indicate a space between the hyphenated c haracters. The characters available are the p arts of the rule, leave a space before the l etters of the alphabet, the numbers, and the hyphen. p eriod and hyphen in medial position, that is Comments enclosed in parentheses are inter- not at the beginning or end of the name. p reted by the computer as spaces. No paren- L eft half: T he first subrule of a rule carries theses may be included within a comment. A the left half if there is one. All list rules have comment continued onto the next card should be a left half and only one subrule. The left half hyphenated. i s separated from the name by one or more N ame section: T he first subrule of a rule has s paces, otherwise it starts in any column after a r ule name starting in column one. A rule t he first. t hat is never referred to by name in a go-to or i n the dispatcher may have an asterisk in col- When the left half could be confused with a umn one instead of a name. subrule name, it should be followed by an equal Fig. 15. A tabulation of all the types of subscripts allowed in the left and the right halves of rules.
- A Programming Language 39 sign to resolve the ambiguity. The possible am- Routing section: The routing section, if writ- biguity is between a left half consisting of a sym- t en, is preceded by two fraction bars and op- bol with no subscripts in a rule with no subrule tional spaces. In the routing section, dispatcher n ame or right half, and the subrule name of a entries may be written in the same way that sub- first subrule with no left or right half. s cripts and values are written in the right half. In addition the input abbreviations *RAA, *RAB, T he left half consists of one or more con- etc., and *RSA, *RSB, etc. may be written s tituents separated by plus signs and optional followed by a number designating one numbered s paces. A constituent may be a symbol or $1 constituent in the workspace. The output ab- with or without subscripts, or it may be a def- breviations *WAA, *WAB, etc., and *WSA, inite or indefinite dollar sign without subscripts, *WSB, etc. may be followed by one or more o r it may be a number, without subscripts, re- n umbers referring in any order to numbered ferring to a numbered constituent already found constituents in the workspace. The *L, *K, in the workspace. and *E may be written followed by one or more The left half of a list rule consists of a single n umbers referring to consecutively numbered constituent composed of a symbol only. c onstituents in the workspace. The numbers A s ymbol is any uninterrupted sequence of a re separated by one or more spaces. Separate c haracters. A character in a symbol may be e ntries in the routing section are separated a letter; period, comma, or hyphen, or an by commas and one or more spaces. Only one asterisk followed by any character except space. *L, *K, or *E abbreviation may be written in T hese latter double characters are treated as a ny rule, and it must be the last thing written s ingle characters by the *E abbreviation. The in the routing section. c haracters have been summarized in figure 12. G o-to: In the go-to is written either the name o f the rule or list that is to be executed next, I f a constituent has subscripts, these follow o r an asterisk signifying that the next rule in t he symbol and are separated from it by a sequence is to be executed next. The go-to is fraction bar and optional spaces. Subscripts s eparated from the rest of the rule by one or are separated from each other by commas and m ore spaces. optional spaces. A logical subscript has a subscript name writ- T he author wishes to express his appreciation t en like a rule name. If it has values, these t o S. F. Best, F. C. Helwig, G. H. Matthews, h ave the form of subrule names and are sepa- A. Siegel, and M. R. Weinstein for their many r ated from it and from each other by one or h elpful criticisms and suggestions. more spaces. A logical subscript need not re- f er to a rule name, but if it does, its Values a re restricted to the subrule names of that Appendix rule. Some Sample Programs T he types of logical and numerical subscript expressions available for use in the left half are We now present a few simple programs writ- t abulated in figure 15 and indicated by an L. t en in COMIT. These programs have been The table also gives an indication of the mean- c hosen for their illustrative and pedagogical i ng of the subscripts and how the logical sub- v alue. In order to see how the computer car- s cript values are stored in the computer in r ies out these programs, the reader may have t erms of zeros and ones. to keep track of the contents of the workspace R ight half: A ny rule that has a left half may a nd dispatcher on a separate piece of paper h ave right halves in its subrules. Each right while going through the programs. h alf is marked by a preceding equal sign and T he first seven examples show how some optional spaces. s imple operations on text can be carried out. T he right half consists of one or more con- T he first one will bring 25 characters of text s tituents separated by plus signs and optional into the workspace from the input. The remain- spaces. A constituent in the right half may be i ng six will insert position markers in various a s ymbol with or without subscripts, or it may places between the characters in the workspace b e a number, with or without subscripts, refer- o r make various substitutions or order changes. ring to a numbered constituent in the workspace. T he position markers must be chosen in such a T he types of logical and numerical subscripts way that they will not be confused with other a vailable for use in the right half are also constituents. l isted in figure 15, and indicated by an R.
- 40 V. H. Yngve T he ninth example is a simple word-for- put text unchanged. Any word that is not found word translation routine. The text is brought i n the dictionary is printed in its original form i n a character at a time, and each character is and enclosed in parentheses. Alternative mean- l ooked up in a list to see if it is a letter or ings are separated by fraction bars. An output mark of punctuation. Each continuous string of l ine is printed as soon as a word is translated l etters between punctuation marks or spaces t hat makes the line exceed 55 characters in is looked up in the dictionary. The punctuation length. A slight additional complication would marks and spaces are carried over into the out- b e needed to prevent a line from starting with
- A Programming Language 41 is, problems of an algebraic or manipulational a space or mark of punctuation, and to allow nature. for the hyphenation of long words at the end of the line. Readers who would like to use the COMIT The eighth example illustrates another class system should correspond with the author for of problems that COMIT is convenient for, that f urther details.
CÓ THỂ BẠN MUỐN DOWNLOAD
-
báo cáo khoa học: " How to develop a program to increase influenza vaccine uptake among workers in health care settings?"
9 p | 56 | 6
-
báo cáo khoa học: " A comparative evaluation of the process of developing and implementing an emergency department HIV testing program"
9 p | 34 | 5
-
Báo cáo khoa học: " A mixed methods inquiry: How dairy farmers perceive the value(s) of their involvement in an intensive dairy herd health management program"
12 p | 64 | 5
-
báo cáo khoa học: " Integrated programs for women with substance use issues and their children: a qualitative meta-synthesis of processes and outcomes"
17 p | 51 | 5
-
báo cáo khoa học: " A qualitative assessment of stakeholder perceptions and socio-cultural influences on the acceptability of harm reduction programs in Tijuana, Mexico"
9 p | 37 | 4
-
Báo cáo khoa hoc:" Effects of an adapted physical activity program in a group of elderly subjects with flexed posture: clinical and instrumental assessment"
11 p | 51 | 4
-
báo cáo khoa học: " Factors contributing to intervention fidelity in a multi-site chronic disease self-management program"
6 p | 53 | 4
-
Báo cáo khoa học: "A Program for the Machine Translation of Natural Languages"
9 p | 54 | 4
-
báo cáo khoa học: " The IGNITE (investigation to guide new insight into translational effectiveness) trial: Protocol for a translational study of an evidenced-based wellness program in fire departments"
8 p | 60 | 4
-
báo cáo khoa học: " Factors that influenced county system leaders to implement an evidence-based program: a baseline survey within a randomized controlled trial"
8 p | 46 | 4
-
báo cáo khoa học: " The implementation of a translational study involving a primary care based behavioral program to improve blood pressure control: The HTN-IMPROVE study protocol "
13 p | 44 | 4
-
báo cáo khoa học: " Individual and setting level predictors of the implementation of a skin cancer prevention program: a multilevel analysis"
13 p | 60 | 4
-
báo cáo khoa học: " Using intervention mapping to develop and adapt a secondary stroke prevention program in Veterans Health Administration medical centers"
11 p | 40 | 3
-
Báo cáo y học: " A well-being support program for patients with severe mental illness: a service evaluation"
9 p | 54 | 3
-
báo cáo khoa học: " Unpacking vertical and horizontal integration: childhood overweight/obesity programs and planning, a Canadian perspective"
11 p | 46 | 3
-
Báo cáo khoa học: " A Type of Program for Mechanical Translation"
0 p | 44 | 2
-
Báo cáo khoa học: "Surgical outcomes of borderline breast lesions detected by needle biopsy in a breast screening program"
6 p | 40 | 2
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn