Báo cáo khoa học: " An Input Device for the Harvard Automatic Dictionary"
lượt xem 3
download
A standard input device has been adapted to permit transcription of either Roman or Cyrillic characters, or a mixture of both, directly onto magnetic tape. The modified unit produces hard copy suitable for proofreading, and records information in a coding system well adapted to processing by a central computer.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: " An Input Device for the Harvard Automatic Dictionary"
- [Mechanical Translation, vol.5, no.1, July 1958; pp. 2-7] An Input Device for the Harvard Automatic Dictionary† Anthony G. Oettinger, Computation Laboratory, Harvard University, Cambridge, Massachusetts A standard input device has been adapted to permit transcription of either Roman o r Cyrillic characters, or a mixture of both, directly onto magnetic tape. The modified unit produces hard copy suitable for proofreading, and records informa- tion in a coding system well adapted to processing by a central computer. The cod- ing system and the necessary physical modifications are both described. The de- sign criteria used apply to any automatic information-processing system, although specific details are given with reference to the Univac I. The modified device is p erforming satisfactorily in the compilation and experimental operation of the Harvard Automatic Dictionary. THE PROPERTIES of a given automatic functions homographically both as a name for information-processing machine depend prima- the number 9 and as a name for a particular rily on the algorithms the machine is capable configuration of a set of four two-state devices. of applying to the tokens 1 for the abstract ele- This practice is confusing in discourse about ments it is said to process. Configurations of machines intended for or adapted to purposes the states of sets of two-state devices, or other than numerical computation, especially pulse trains where pulses are present or absent when the relation between machine tokens and in definite time intervals, are commonly used abstract elements is the chief subject of discus- as tokens in contemporary machines. Abstract sion. In this paper, therefore, "0" and "1" will elements, e.g., the integers, are named by be used exclusively as the names of tokens. symbols of various kinds. For example, the The mapping between machine tokens and the numerals "2", "II", and "10" all name the abstract elements a given machine is said to number 2. Likewise, various symbols can be process can be regarded as defined by the input used to name tokens. It is a useful and widely and output hardware of the machine. For ex- accepted convention to use the symbol "0" as a mple, if a pulse train 1010100 is to be re- t he name for one state of a two-state device, g arded as a token for the letter A, it is desir- and the symbol "1" as a name for its other state. able to arrange matters so that such a pulse Frequently, the symbols "0" and "1" are used train will cause a printer to print the literal "A". a lso as binary numerals. In a context where When an order relation exists among the tokens both these usages occur, a string such as "1001" in a machine, as imposed, for example, by com- parison and branch instructions, and when the a bstract elements themselves are an ordered s et, it is usually desirable to relate abstract † This work has been supported in part by e lements and tokens by an order-preserving the Harvard Foundation for Advanced Study and mapping. For example, in a machine designed Research, the United States Air Force, and the to recognize 1010100 to be "smaller" than National Science Foundation. 0010101 and 0010101 in turn to be smaller 1. This term was originated by C. S. Peirce. than 0010110, the mapping A — 1010100, For an explanation of the underlying distinc- B — 0010101, C — 0010110 preserves normal tions, see H. Reichenbach, Elements of Sym- alphabetic order, whereas A — 0010101, bolic Logic, Macmillan, New York, 1947, p.4. B — 1010100, C — 0010110 does not.
- An Input Device 3 The Univac I computer is currently in use at the major output unit. Thus, when an A is the Harvard Computation Laboratory in connec- typed, a token 1010100 is recorded, and such tion with the development of an operating auto- a t oken will in turn cause the High-Speed m atic dictionary 2 a nd for basic research on P rinter to print an A. the problems of automatic translation from Russian into English. The normal mapping be- Adapting a machine like the Univac to handle tween numbers, letters of the Roman alphabet, Cyrillic letters is conceptually a trivial matter. punctuation marks, and other standard symbols To permit alphabetization of Cyrillic material, on the one hand, and machine tokens on the other, an order-preserving mapping between the Cy- is given in Figure 2 by the columns headed rillic alphabet and Univac tokens is necessary. "Upper Case" and "Binary Code" (except for Many such mappings can readily be established. key no. 0). This mapping is established by all Once this has been done, the internal operation input and output devices associated with the of the machine with Cyrillic material presents machine, in particular by the Unityper, which no difficulties. However, unless the input and i s used to record information onto magnetic output devices are physically altered, certain tape, and by the High-Speed Printer, which is practical problems obviously arise. Keyboard Layout Figure 1 2. Oettinger, A. G., Foust, W., Giuliano, V., As a first step, it is simple to cover the keys M agassy, K., Matejka, L., "Linguistic and on the Unityper with keytops labelled with Cy- Machine Methods for Compiling and Updating rillic letters. From the point of view of typing the Harvard Automatic Dictionary" (To be pre- ease and accuracy the most desirable keyboard sented at the International Conference on Scien- layout (Fig. 1) is one in standard use on ordi- tific Information, Washington D.C., November nary Cyrillic typewriters. Unfortunately, 1958, and published in the Proceedings of the merely replacing keytops solves only a part of conference). t he practical problem. First, the typewriter
- 4 A.G. Oettinger Definition of Mappings Figure 2 continues to print Roman letters (e.g., Q for Й ), C yrillic information in a "typewriter code." a cryptographic transformation that makes A subsequent code conversion is made automat- proofreading most difficult. Second, the cor- ically on the computer, at the expense of some respondence between the Cyrillic alphabet and running time, leading to the representation of machine tokens established in this way does not Cyrillic letters in a "ranked code." The re- preserve Cyrillic alphabetic order. To recon- sultant mapping is order-preserving. In Figure cile these conflicting demands, a composition 2, the Cyrillic letters are named in the "Lower of two successive mappings can be used. 3 The Case" column. The token corresponding to a first, established by the input device with p articular Cyrillic letter in the ranked code is covered keytops, leads to the representation of named in the "Binary Coding" column, in the same row as the letter. The choice of this par- 3. Ibid. ticular mapping was made for technical reasons
- An Input Device 5 Modified Roman / Cyrillic Unityper Figure 3 described in detail elsewhere.4 Similar expedi- Recently, we modified a standard Unityper to ents have been used by others.5 enable both the direct conversion from Cyrillic to ranked code, and the production of Cyrillic h ard copy. The necessity for a costly inter- mediate code conversion by the computer itself 4. Giuliano, V., "Programming an Automatic Dictionary" Design and Operation of Digital is thereby eliminated, and proofreading is made Calculating Machinery, Progress Report AF-49, relatively easy. The layout of the keyboard Harvard Computation Laboratory, 1957, pp. of the modified typewriter is shown in Figure 1. I -42-I-45. Figure 3 is a photograph of the actual machine. A sample of the hard copy produced by the mod- 5. Edmundson, H.P., Hays, D.G., Renner, ified Unityper is shown in Figure 4. The facil- E.K., Button, R.I., "Manual for Keypunching i ty for interspersing standard and Cyrillic sym- Russian Scientific Text" RM-2061, RAND Cor- bols is proving extremely useful in the recording poration, 1957. of Russian texts, as illustrated in Figure 4.
- 6 A. G. Oettinger Demonstration Hard Copy Produced by the Modified Unityper F igure 4 would print in cryptographic form, e.g., "56EU" In lower case, the typewriter is Cyrillic. Ex- for "ДЕНЬ" A fast transliteration routine de- cept for three of the very low frequency letters, veloped by Andrew Kahr for converting ranked the layout is standard. In upper case, the type- code into a standard transliteration code has w riter functions as a standard model, except proved satisfactory for experimental purposes. f or the absence of a few special symbols nor- It yields, for example, "DEN'" for "ДЕНЬ" . mally available, and for the presence of one infrequently used Cyrillic letter. The mapping w hich obtains when the typewriter is in upper Relatively few physical changes were neces- c ase is described by the "Upper Case" and sary to achieve the desired modifications. Spe- "Binary Coding" columns of Figure 2. For ex- cially prepared keytops labelled as in Figure 2 ample, 1101011 is a token for the letter Q. In had to be substituted for the normal ones. Cor- l ower case, the mapping is that described by responding type slugs were not available on the the "Lower Case" and "Binary Coding" columns. m arket, but were cast by the manufacturer For example, 0010011 is defined as a token for f rom dies specially cut to our specifications. t he Cyrillic letter Й. The correspondence between typewriter keys and the machine tokens is established physically T he symbols circled in the "Lower Case" by a set of encoding bails, notched in the pattern column are the normal correspondents of the described in Figure 2. A photograph of the bail tokens. For example, while 0010011 is defined a ssociated with the leftmost column of binary as a token for Й in the ranked code, it is nor- coding (Column 1) is shown in Figure 5. These mally a token for the semi-colon. Therefore, bails were cut in our shop from blanks provided since the output equipment has not been modi- by the manufacturer, who undertook to harden fied, Cyrillic material in the ranked code still the cut bails to his own specifications. Instal-
- An Input Device 7 ling keytops, type slugs, and bails presented no E dward L. Fitzgerald and Ted Carp, for their cooperation, especially in casting type slugs to unusual difficulties. our specifications, and to Messrs. Allen The author wishes to express his appreciation Christensen and Daniel Spillane of the Staff of to the Remington Rand Univac Division of Sperry the Computation Laboratory for machining the Rand Corporation, in the persons of Messrs. bails. An Encoding Bail Figure 5
CÓ THỂ BẠN MUỐN DOWNLOAD
-
BÁO CÁO KHOA HỌC: CHẤT LƯỢNG DỊCH VỤ, SỰ THỎA MÃN, VÀ LÒNG TRUNG THÀNH CỦA KHÁCH HÀNG SIÊU THỊ TẠI TPHCM
14 p | 596 | 134
-
Báo cáo khoa học: " BÙ TỐI ƯU CÔNG SUẤT PHẢN KHÁNG LƯỚI ĐIỆN PHÂN PHỐI"
8 p | 295 | 54
-
Báo cáo khoa học: Một số lưu ý khi sử dụng MS project 2007 trong lập tiến độ và quản lý dự án xây dựng
6 p | 236 | 48
-
Báo cáo khoa học: Nghiên cứu đề xuất biện pháp phòng ngừa và phương án ứng phó sự cố tràn dầu mức I tại thành phố Đà Nẵng
145 p | 174 | 38
-
Báo cáo khoa học : NGHIÊN CỨU MỘT SỐ BIỆN PHÁP KỸ THUẬT TRỒNG BÍ XANH TẠI YÊN CHÂU, SƠN LA
11 p | 229 | 28
-
Báo cáo khoa học: " XÁC ĐỊNH CÁC CHẤT MÀU CÓ TRONG CURCUMIN THÔ CHIẾT TỪ CỦ NGHỆ VÀNG Ở MIỀN TRUNG VIỆTNAM"
7 p | 246 | 27
-
Báo cáo khoa học: Đánh giá tổng hợp tiềm năng tự nhiên, kinh tế xã hội; thiết lập cơ sở khoa học và các giải pháp phát triển kinh tế - xã hội bền vững cho một số huyện đảo
157 p | 172 | 15
-
Báo cáo khoa học: Phản ứng điều chế Polyetylen glycol diacrylat và copolyme hóa với metyl metacrylat
10 p | 243 | 14
-
Báo cáo khoa học: Nghiên cứu khả năng ứng dụng của Srim-2006 cho việc tính toán năng suất hãm và quãng chạy hạt Alpha trong vật liệu
5 p | 167 | 10
-
báo cáo khoa học: " Designing an automated clinical decision support system to match clinical practice guidelines for opioid therapy for chronic pain"
11 p | 103 | 5
-
báo cáo khoa học: " Part I, Patient perspective: activating patients to engage their providers in the use of evidencebased medicine: a qualitative evaluation of the VA Project to Implement Diuretics (VAPID)"
11 p | 122 | 5
-
Báo cáo khoa học: " Detection of hepatitis E virus in wild boars of rural and urban regions in Germany and whole genome characterization of an endemic strain"
7 p | 82 | 4
-
báo cáo khoa học: " Marketing depression care management to employers: design of a randomized controlled trial"
7 p | 106 | 4
-
báo cáo khoa học: " Dental and craniofacial characteristics in a patient with Dubowitz syndrome: a case report"
5 p | 121 | 4
-
báo cáo khoa học: "Peritoneal mesothelioma in a woman who has survived for seven years: a case report"
4 p | 95 | 4
-
báo cáo khoa học: " An observational study of the effectiveness of practice guideline implementation strategies examined according to physicians' cognitive styles"
9 p | 118 | 3
-
báo cáo khoa học: " Which factors explain variation in intention to disclose a diagnosis of dementia? A theory-based survey of mental health professionals"
10 p | 86 | 3
-
báo cáo khoa học:" Relationships between changes in pain severity and other patient-reported outcomes: an analysis in patients with posttraumatic peripheral neuropathic pain"
8 p | 67 | 3
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn