Báo cáo khoa học: "Braille Transcription and Mechanical Translation"
lượt xem 2
download
TRANSCRIBING romanized print into Braille suitable for reading by the blind is a problem which has similarities to those arising in mechanical translation. The theoretical problem of mechanical translation is to construct an operational syntax - a set of formal rules of translation prescribing operations to be performed on the text to get the output text - entirely in terms of patterns of input words and types of words and such information as may be contained in the dictionary.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: "Braille Transcription and Mechanical Translation"
- [Mechanical Translation, vol.2, no.3, December 1955; pp.50-53] Braille Transcription and Mechanical Translation John P. Cleave, Birkbeck College, University of London, London, England TRANSCRIBING romanized print into Braille There are four grades of Braille: Grade I, un- suitable for reading by the blind is a problem contracted; Grade "one-and-a-half"; Grade II, which has similarities to those arising in me- moderately contracted; Grade III, highly con- chanical translation. The theoretical problem tracted. The latter grade is rarely used. Grade of mechanical translation is to construct an oper- I presents no problem to the computer. Grades ational syntax - a set of formal rules of transla- "one-and-a-half" and II are the more profitable tion prescribing operations to be performed on lines of inquiry, the text to get the output text - entirely in terms of patterns of input words and types of words and such information as may be contained in the dic- ● tionary. And this is simplified already, firstly by the small vocabulary (consisting of a definite ● wh number of letters, capitalized letters, punctua- tion marks, etc.) and the absence of ambiguity ● and, above all, the existence of explicit rules for transcription which are already partly formal- Figure 2 ized. The Braille Systems The problem to be dealt with is that of con- Braille is a system of embossed characters structing a program by which an electronic com- formed by six dots arranged and numbered as in puter will do the work of making the contractions Fig.l(a). In the project outlined here the output correctly. We envisage an input organ to the of the computer presents the Braille characters electronic computer with a keyboard with keys as a series of six "1's" or "0's" corresponding for all the characters used in inkprint (including to the six Braille dots. Thus the Braille charac- punctuation marks). The output from this organ ter of Fig.l(b) is represented by the binary num- is in the form of binary numbers (machine cha- ber of l(c). racters) on which the computer operates and finally obtains from each such number a six digit binary number representing the six Braille dots. (Fig.l) An output mechanism, similar to 1● ●4 ● an ordinary teleprinter (it could in fact be such 2● ●5 ● 1 01011 a piece of equipment fitted with a mechanical de- 3● ●6 ●● vice ), will convert this number into the Braille characters as actually used. (a) (b) (c) The Braille signs used in this project are as shown in Fig.3. These characters are divided Figure 1 into classes called "lines." Line 1 is formed by dots 1-2-4-5. Line 2 is formed by adding dot 3 to each of the characters of line 1, and line 3 by W hile to each letter-press character there the addition of dots 3 and 6 to line 4. Line 4 is corresponds one Braille sign, there are Braille formed by the addition of dot 6 to line 1 signs. characters (single-cell contractions) and pairs Line 5 is obtained by repeating line 1 in a lower of Braille characters (double-cell contractions) position. This classification has no significance which under various conditions represent groups as far as the Braille rules are concerned. of inkprint letters. Thus, the Braille character A further classification of Braille signs, which of Fig.2 represents the group "wh" in that order. cuts across the "line" division, is the classifi- The rules of Braille largely concern the con- cation into "lower signs" and "non-lower signs"; ditions under which contractions can be made. a lower sign is a Braille sign which does not
- Braille Transcription 51 c ontain dot 1 or dot 4. The lower signs are all F ormalization of the Rules t hose of line 5 together with "com" of line 6. T his again is a formal property of the Braille T he rules followed in this work are those p rinted in Standard English Braille. 1 The rules F irst Line a s expressed in the bookle.t are not all usable ABCDEF GHI J f or a mechanical transcription of inkprint char- ● ● ●●●●● ●● ●● ● ● ● a cters into Braille as they stand, though they ● ● ●● ● ● ● ●● ●● a re perfectly satisfactory for a human agent. To b e put in a form suitable for the construction of a m achine program the rules must be formal- S econd Line i zed. That is, all reference to terms which KLMNO P Q RS T c annot be given an extensional definition in ●● ●● ●● ● ●● ●● ● ● ● t erms of the machine characters, or a definition ● ● ● ● ●● ●● ● ●● i n terms of their formal properties, must be ●● ● ● ● ● ● ● ● ● e liminated. For instance, rule 34 reads: C ontractions forming parts of words should n ot be used when they are likely to lead to T hird Line o bscurity in recognition or pronunciation U V X Y Z and for of the with a nd therefore they should not overlap well- ● ● ●● ●● ● ●● ●● ● ● ● defined syllable divisions. Word signs should ● ● ● ● ●● ●● ● ●● b e used sparingly in the middle of words ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● u nless they form distinct syllables. Special care should be taken to avoid undue con - t raction of words of relatively infrequent o ccurrence. F ourth Line T he principal term in this rule is "syllable." ch gh sh th wh ed er ou ow W I t would be possible to formalize this term if ● ● ●● ●● ● ●● ●● ● ● ● a c omplete list of syllables could be compiled. ● ● ●● ●● ●● ● ●● T his would be a clumsy procedure and would ● ●● ● ● ● ● ● ● ● r equire comparison of incoming words with a l arge dictionary for recognition of syllables. S imilar difficulties arise with "pronunciation," F ifth Line t hough the problem is largely solved when the , be con dis en ff gg in " syllable" question has been resolved. The m ost simple way to resolve the issue is to ig- bb cc dd n ore the restrictions imposed by this rule. ● ● ●● ●● ● ●● ●● ● A nother, which includes a non-formal restric- ● ● ●● ●● ● t ion, is rule 21: The word signs and, for , of, the, with, a, may follow one another without a space Sixth Line w here the sense permits. . . st ing ble ar com T he condition "where the sense permits" is ● ●● ● i mpossible to formalize fully except by con- s tructing a list of phrases in which the elimi- ● ● n ation of the space between these "and-words" ● ● ● ●● ● ●● may be effected without destroying the sense. H owever, the sense may not be determined by the phrase but by the whole sentence. The task F igure 3 o f including this condition in its entirety in a machine program is now immense. Confusion c ould arise when a space is eliminated between s ign, but for technical convenience it is explic- a nd-words where at least one is part of a word. i tly represented by a code digit attached to the c oded Braille. The rule concerning the contrac- t ion of double letters requires explicit mention 1 Published by the "National Institute for the o f the lower sign property. Blind," London, 1932.
- 52 John P. Cleave T he restriction could then be formalized to 1 st position punctuation digit r ead: 2 nd position "and"word digit " . . unless at least one of the and-words is 3 rd position "lower sign" digit p art of a word" T hese are the code digits. The 4th – 9th posi- I t is simpler to ignore the wide restriction and t ions represent the Braille dots: these digits t o base the space-elimination entirely upon the a re the machine representation of Braille. o ccurrence of the words. More will be said of T he first digit, showing whether the letter is a t his rule later. p unctuation mark, presents explicitly a property O n the other hand some of the rules are al- o f the alphabetic letter rather than of the struc- r eady adequately formalized. For instance, t ure of the corresponding Braille sign, for a R ule 27: B raille sign may be used either as a contraction T he contractions bb , cc , dd , ff , gg , may only o r as a punctuation mark (see the signs of line 5). b e used when they occur between letters and S ince some of the Braille rules concern the oc- s igns of the same line of Braille. c urrence of punctuation marks, it is necessary S ince "word" and "line" can be given formal de- t hat the machine characters corresponding to f initions the rule as it stands is sufficient though s uch signs carry that information explicitly. i t is more explicit (ignoring the complication T hus the machine can determine the presence of c aused by "line") if we simply say: a p unctuation mark in the accumulator by shifting U se the contractions bb , cc . dd , ff , gg i f the l eft one place and then using the conditional trans- s ign preceding and the sign following b b , f er order to discriminate on the sign digit. c c , d d , f f , g g a re neither spaces nor punc- P attern Sensing t uation marks. A n important principle in formalizing the A m ethod of detecting patterns of signs is to r ules is the explicit representation in the ma- d elay the final printing while sending the last c hine characters of the properties used for the s everal characters in turn through a series of o peration of the program. For instance, a word m emory locations. The context of any machine c an be defined formally as the series of signs c haracter can then be searched. An illustration l ying between signs each of which is either a o f this process is provided by the following s pace or punctuation mark. We therefore require m ethod of operating Rule 21 mentioned above. t hat the computer recognize the punctuation T he series of machine characters, after having m arks. It would obviously be possible to define b een modified by the contraction program to t he punctuation marks extensionally as "either p roduce the and-word characters, is sent seri- t he comma or full stop or exclamation mark or.." a lly through five memory locations. If the con- T he process by which the machine recognizes d itions for space elimination are not present, t he punctuation mark is then quite complicated, t he character in the fifth position is sent to the i nvolving comparison of the incoming letter with " print routine" which removes the code digits e ach punctuation mark in turn, which is slow a nd prints the six digits representing the Braille a nd wasteful of storage space. The simplest s ign. The characters in the remaining positions p rocedure is to indicate membership of this a re then shifted one place by the "shift routine" g roup of words by a digit of the machine charac- l eaving the first place to be occupied by a new t er. Several other properties, either of the c haracter from the contraction routine. Rule 21 B r a i l l e c h a ra c t e r s o r t h e l e t t e r -p r e s s c h a r a c - i n the form required by the machine program t ers, and membership of various other classes n ow reads: a re best represented by digits of the machine (i) if there are either punctuation marks or c haracters. s paces in locations (1) and (5) go to (ii); if n ot go to the print routine. T he Structure of the Machine Characters ( ii) if there is a space in (3) go to instruction ( 3); if not go to the print routine. T he machine characters must bear the six di- ( iii) if there are and-words in both positions g its representing the Braille dots. It is techni- (2) a nd (4) shift the character in (2) to (3) and c ally convenient to represent the membership of t hat in (1) to (2) (space-elimination); if not go t he various classes of sign by a set of three di- t o the print routine. g its (the code-digits) preceding the six Braille T his version of the rule is in fact weaker than d igits, so that the machine character is a num- t he original since it permits only pair-wise jux- b er with nine binary digits. Thus the machine t aposition of "and"words. But it does deal ade- c haracter has the following structure:
- B raille Transcription 53 quately with the majority of cases. It would be the remainder of the word is treated in the same possible to construct a routine for effecting the way. Should no entry be found, the first letter space-elimination in all the circumstances de- is sent to the Final Word Store and the matching manded by the formalized version: procedure started with the second letter. "the 'and' words may follow one another with- There may be several ways of contracting a out a space unless at least one of them is word. The choice between the methods of con- part of a word" traction is governed by considerations of length. This, however, would be rather long and would That way must be chosen which gives the not be justified by the frequency with which three shortest transcription. The case where two or more consecutive and-words occur, compared different methods of contraction yield words of with the relatively large frequency of pairs of equal length is governed by rule 35: and-words. In cases where a word may according to the More complicated procedures of a similar above rules be contracted in two or more nature are necessary to operate the rules con- ways, each saving the same amount of space, cerning numerical expressions, ellipsis, com- that way should be selected which produces pound lower signs and capital letters. the most readable combination of dots. If the same space is saved, simple contractions The Dictionary are better than two-celled word-signs. Avoid using Double Letter Signs where there In Grade ‘one-and-a-half’ it is unnecessary to is an alternative single cell contraction. have a dictionary for the contractions; incoming The dictionary is so constructed that the shortest l etters may be compared on arrival with pos- set of contractions is automatically chosen. For sible members of contractions by means of a instance, "themselves" precedes "the" in the "contraction routine." Thus, if an "a" is de- dictionary so that if "themselves" occurs in the tected, the contraction routine compares the Initial Word Store it is compared with the appro- following character with "r". If an "r" is found, priate entry before being compared with "the". the "ar" contraction is subjected to the next part If, however, "them" occurs in the text, the longest of the program; if not, "a" is sent to the next dictionary entry occurring which is part of that part of the program after which the letter fol- word is "the". The priority rule for single-cell lowing "a" is examined to determine whether it contractions is solved by including in the dic- could be the initial letter of a group which could tionary those phrases which provide a double- be contracted. "translation." For instance, the phrase "oner" Grade II Braille, on the contrary, contains so occurs in the dictionary and precedes "one". many contractions that it is necessary to use a "Oner" may be contracted in two ways - "one r" "dictionary" of groups which can be contracted. and "o n er. "In the first case "one" is a two-cell Characters must then be fed in serially and contraction so that "one r"occupies three cells. stored in a set of temporary locations - the Ini- In the second case the translation occupies three tial Word Store - until a whole word has been cells since "er" is a single-cell contraction. By received. The dictionary matching mechanism rule 35 "o n er " is the correct translation of then takes the first letter in the Initial Word "oner" so the dictionary includes o n er as the Store and finds the longest dictionary entry which dictionary entry. Thus, Rule 35 does not appear is part of that word. The appropriate contrac- explicitly in the machine program but is implicit tion is selected and sent to another set of storage in the construction of the whole program and, in locations - the Final Word Store - after which particular, of the dictionary.
CÓ THỂ BẠN MUỐN DOWNLOAD
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn