intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Báo cáo khoa học: "The "Spectrum" of Weak Generative Powers of Grammars"

Chia sẻ: Nghetay_1 Nghetay_1 | Ngày: | Loại File: PDF | Số trang:5

41
lượt xem
2
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

A summary is presented of some results in the literature concerning the generative powers of various formal grammars. The relative generative powers are displayed graphically.I. Introduction Many forms of grammars have been proposed in the study of such related language problems as mechanical translation, computer languages, mathematical linguistics, and the more general characterizations of natural languages.

Chủ đề:
Lưu

Nội dung Text: Báo cáo khoa học: "The "Spectrum" of Weak Generative Powers of Grammars"

  1. [Mechanical Translation and Computational Linguistics, vol.9, no.1, March 1966] The "Spectrum" of Weak Generative Powers of Grammars by Wayne A. Lea, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge A summary is presented of some results in the literature concerning the generative powers of various formal grammars. The relative generative powers are displayed graphically. Thus, our goals are: (1) the listing of references I. Introduction where relationships between grammars, languages, and Many forms of grammars have been proposed in the machines are presented and (2) the handy pictorial study of such related language problems as mechanical presentation (in a single “spectrum”) of the relative translation, computer languages, mathematical linguis- weak generative powers of such grammars and their tics, and the more general characterizations of natural corresponding machines and resulting languages. languages. It is thus interesting to inquire about the Though it is hoped that this listing and display of relationships between such grammars. In particular, grammars will be in some sense exhaustive of known one might ask which proposed grammars are the “most results, some possible grammar types may have been powerful” (in some meaningful sense) and which are missed. One advantage of the spectrum display used the most accurate characterizations of natural-language herein (Fig. 1) is that such additions can be easily re- phenomena. lated to known grammars by simply marking them at In this paper, grammars will be compared on the the appropriate positions on the spectrum. basis of the possible symbol sequences they may pro- There are some known grammars whose relation- duce—that is, on the basis of what has been called ships to other grammars are as yet unknown. The their “weak generative powers.” The relationships will "branching" of the spectrum of Figure 1 will illustrate be displayed on a “spectrum” of weak generative these uncertain relationships and thus indicate several powers of grammars. It is hoped that this concise unsolved problems in algebraic linguistics. graphical display will be found an illuminating and useful comparative summary of grammars, generated languages, and equivalent machines. II. Languages, Grammars, and Machines No attempt will be made to explain in any detail the In combinatorial systems (see reference 1 or 2) and various grammars and machines listed in this paper, formal linguistic theory (reference 3, chap, iv), a nor will the relationships discussed be proven, since language is simply a set of sequences or strings pro- they have already been considered in detail in various duced by concatenation of elements out of some finite published papers. We shall merely consider a brief vocabulary, set VT. A grammar G is then a set of rules listing of each grammar, language, or machine type, (or “productions”) for enumerating the strings belong- and references where each relationship to other gram- ing to the language. A grammar may be precisely de- mars, languages, and machines is shown. In listing fined as a 4-tuple (V, VA, S, P), where V is a finite references, our purpose is not to acknowledge the origi- non-empty vocabulary, VA (called the auxiliary vocabu- nal developers of each interrelationship but, rather lary) is a non-empty subset of V (and represents the only to provide references where demonstrations of symbols or phrase categories used at intermediate steps such relationships can be found. Although the author in the generation of a string), S (the axiom, or initial does not profess to have checked that all summarized string) is a member of VA, and P is a finite set of pro- results are valid, the literature indicates that they are. ductions which yield strings in the terminal vocabulary More important, the use of the chosen form of display (VT = V — VA) by substitutions starting with the clarifies any stated relationships between various formal axiom S. The language L generated by G is a subset of grammars and proposed grammars of natural lan- the free monoid VT* generated by concatenating mem- guages. bers of VT. Terminal strings (members of L) are pro- duced by derivations consisting of finite sequences of applications of the productions of G, starting from * This paper is a revision of a memorandum written in June, 196.5, when the author was affiliated with the Mechanical Translation Group axiom S. A production of string ψ from string φ will be of the Research Laboratory of Electronics, Massachusetts Institute symbolized as φ → ψ, while a derivation of ψ from φ of Technology. The author acknowledges the co-operation and en- is symbolized as φ ⇒ ψ. couragement of several members of that group, including its director, Victor Yngve. The help of G. H. Matthews in providing references To restrict the languages generated by grammars to and reviewing early drafts of the paper is also acknowledged. This interesting proper subsets of the free monoid VT*, it work was supported in part by the National Science Foundation (grant GN-244) and in part by the Joint Services Electronics Pro- is necessary to restrict the form of productions allowed. gram under contract DA36-039-AMC-03200(E). 10
  2. tain languages within other types will be displayed as The broadest generative power of interest in mathe- in Figure 1. The inclusion relation between languages matical linguistics is the power of a general Turing ma- is shown by the relative height on the line diagram or chine. Since Turing machines are associated with all effectively computable functions or algorithms,3 broader “spectrum”; points higher on the spectrum represent language types (sets) of which all lower points are generative power would involve sets which could not special cases (subsets), resulting from added restric- even be effectively (i.e., mechanically) enumerated. tions on the productions allowed in the grammars. Equivalent grammars are shown as a single point on III. The Spectrum the spectrum. (Thus, for example, the diagram illus- trates the inclusion of all context-free languages within We shall now consider how the weak generative the set of context-sensitive languages, which are in turn powers of various grammars and machines are related. included in all recursive sets, which are also in turn Grammars are considered to be weakly equivalent when a proper subset of the recursively enumerable sets.) they produce the same language. Types of grammars The "branching" at the lower end of the spectrum are thus equivalent if for each language produced by indicates one of two types of relationship. Either it is a grammar of one type there is a grammar of the other not presently known how some such “branched” types type which produces the same language, and vice of grammars, languages, or machines are related with versa. respect to weak generative powers, or else the types In accordance with the frequent use of line diagrams are known to be incomparable with respect to inclu- in set theory, whereby the inclusion of sets within sion. For example, it is not known whether all meta- others is pictorially displayed by showing successive linear grammars are included within the sequential subsets as successively lower points on a vertical line, grammars, or vice versa, or whether they are inter- the equivalences of grammars and the inclusion of cer- 11 GENERATIVE POWERS OF GRAMMARS
  3. grammars (Chomsky, reference 1, theorem 4). Con- secting sets, with some metalinear grammars not being text-free grammars are also called Type 2 grammars. sequential, and some sequential not metalinear. (Some Context-free grammars have been shown to be of these questions may be easy to answer, but I have weakly equivalent to normal grammars (which have made no effort to do so. Perhaps the reader may at- rules of only the forms A → BC and A → a, for a∈VT, tempt such studies.) and thus represent binary trees1,4), modified normal grammars (with no pairs of rules A → BC and D → EB IV. References allowed), admissible grammars (in which every rule The following is a list of references where each is used to generate some sentence and every generated equivalence of grammars or machines is shown, or string can be “completed” by further expansion into a where certain grammars are shown to be properly in- terminal string, so no “dangling,” unterminated deriva- cluded within other grammar types. The letter label- tions occur), and grammars with only left derivations. ing each member of this list corresponds to the letter These facts are shown in references 1, 4, 7, and 8, of the point on the spectrum which is presently being respectively. Gross9 and Gaifman10 have shown that dependency discussed. A. Davis has shown (reference 3, chap. vi) the grammars are equivalent to context-free grammars. equivalences of Turing machines, recursively-enumer- It has also been shown that context-free languages able sets, and combinatorial systems of semi-Thue, are accepted by nondeterministic push-down storage automata.4 Thus, a single point on the spectrum of weak Thue, Normal and Post types. Chomsky (reference 1, theorem 2) has shown that his Type 0 grammars are generative power represents Type 2, or context-free equivalent to these systems. (The reader should be grammars, normal grammars, modified normal gram- cautious when interpreting the present numbering mars, admissible grammars, left- (or right-) derivation scheme; Chomsky used a different one in reference 4). schemes, and non-deterministic push-down storage B. In grammars, we may often be interested in automata. Postal (reference 11, chap, iv; see also determining whether or not a sentence is a member of Chomsky, reference 12) has claimed that many gram- a language set. Those sets for which this membership mars of natural languages, such as Block's Japanese is effectively decidable are called recursive (or decid- syntax, Well's immediate-constituent grammars, Harris' able) sets. Recursive or decidable sets are known to be morpheme class substitution system, Hockett's item-and- a proper subset of recursively enumerable sets (see arrangement system, Lamb's stratificational syntax, and Davis, reference 3). tagmemics all appear to be equivalent to context-free C. The productions used in semi-Thue systems may grammars. (Such demonstrations of equivalences as be restricted to those of the form φAψ→φωψ, where a these between natural-language grammars and formal single symbol A is rewritten as a substring ω (non- grammars depend, however, on the particular explicit, null) and φ and ψ are strings from V*. This results in formal assumptions about the nature of vague, informal formal grammars called (after Chomsky) context-sen- explications in natural-language descriptions. Thus, the sitive phrase-structure grammars. Chomsky has also formal assumptions often may be contested, with dif- called them Type 1 grammars and shown that the ferent assumptions implying different formal equiva- languages generated by such grammars are properly lences. For example, by suitably weak assumptions included in the set of recursive sets (reference 1, about stratificational grammars, they can be made to theorem 3). He also showed that such grammars are generate any recursively enumerable set, rather than equivalent to grammars in which, for each rule φ→ψ just context-free languages. [I am indebted to Stanley the length of ψ is not smaller than that of φ. Peters for this example.] The assumptions involved in Kuroda5 has shown that a set is a context-sensitive the equivalences shown in Figure 1 are, however, ap- language if and only if it is accepted by a non-deter- parently the prevalent ones in the literature.) Bar- ministic linear-bounded automaton. Hillel's categorical grammars are shown to be equiva- D. In reference 4 (pp. 365-67), Chomsky suggested lent to context-free grammars in reference 13. In refer- that grammars with no rules of the form φAψ→φBψ ence 9, Gross shows a model based on predicative (where A and B are single non-terminal symbols and analyses to be equivalent to context-free grammars. F. Chomsky and Schützenberger14 have shown that either φ or ψ is not null) appear to be a proper subset of context-sensitive grammars and yet (as Parikh6 had the set of context-free languages properly includes the previously shown) contain context-free grammars (to set of metalinear languages. Metalinear grammars have be discussed under point E) as a proper subset. non-terminating rules of the form A → xBy or of the E. When the rewriting of A as ω is unrestricted by form S → φ and no rules of the form A → φSψ for any A ∈ V and φ, ψ ∈ V*. the context φ—ψ, the context-free rules of the form A → ω are obtained. Context-free grammars (with only G. Chomsky and Schützenberger14 also showed that rules of the form A → ω) have been shown to be a linear grammars (in which each non-terminating rule proper subset of context-sensitive phrase-structure is of the form A → xBy) are also a subset of metalinear 12 LEA
  4. one-sided linear languages are properly contained with- grammars, as is obvious from their form. in the set of sequential languages. H. A proper subset of the linear languages is the Chomsky1 has shown (as was mentioned in point K) minimal linear languages whose grammars have only that all regular languages are special cases of counter one non-terminal (namely, the axiom) and rules of the languages. forms S → xSy and S → c, with the additional restric- M. A special type of automaton is a member of the tion that c does not appear in the x's and y's in the set of “k-limited automata,” whose state function is rules. Clearly, a minimal linear grammar is linear, but determined by the last k symbols of input sequence. not all linear grammars are minimal. Clearly, not every finite automaton is a k-limited autom- I. Unique phrase-structure grammars, (which have aton (reference 4, pp. 333-34.) rules of the forms A → x, and A → yAz, except for the N. Those k-limited automata for which k = 1 are axiom S, which introduces all non-terminals, including called by Ginsburg “completely sequential machines.”16 itself) are clearly a subset of context-free grammars, Clearly, not every k-limited automaton is 1-limited. since each rule is a context-free rule. Apparently noth- O. A restriction on sequential grammars which does ing else is known about their relative weak generative not allow recursive rules like Ai → φAiψ gives minimal powers. J. Ginsburg and Rice15 have shown that all sequen- sequential grammars. It is apparent that minimal se- quential grammars are all sequential, and their finite tial grammars are context-free grammars and that they nature, due to not allowing reintroduction of symbols, are properly included in the context-free ones. Sequen- makes them all regular, as well. tial grammars are context-free grammars for which there exists an ordering of the non-terminal symbols such that for each i, j, if Ai ⇒ φAjψ then j ≥ i (or V. Relationship to Natural Languages equivalently, ordered such that no rule Ai → φAjψ for j < i). This restriction on the set of rules is such that An interesting question relating to this spectrum of if one symbol Ai is expanded into a string containing weak generative powers is how grammars of natural Aj, there is no derivation which, in turn, expands Aj languages fit into the spectrum. That is, what are their into a string containing Ai. apparent weak generative powers compared to those K. Counter languages were discussed by Schützen- of the formal grammars discussed above? We have al- berger in 1957 in an unpublished paper and, later, by ready seen that interest in being able to establish Chomsky,1 as being those produced by a device con- whether or not a string is a sentence of the language sisting of a finite automaton with an addition of a finite requires that the grammars be restricted to generative number of counters, each with an infinite number of power less than or equal to that which generates the positions. It is not known whether counter languages recursive sets. Furthermore, Chomsky has argued that are all context free. But it is clear that the regular the arbitrary permutations allowed by context-sensi- languages (to be discussed under point L) are all tive grammars are undesirable in grammars of natural special cases of counter languages, with the number of languages (reference 1; see also reference 12 and refer- counters equal to zero. ence 4, p. 365). Thus, powers less than those of arbi- L. If all rules of a context-free grammar are re- trary context-sensitive grammars seem to be needed for stricted to the forms A → aB or A → a, where a∈VT characterizing natural languages. and B∈VA, then what Chomsky1 calls Type 3 grammars On the other end of the spectrum, it has been are obtained. Chomsky has shown (reference 1, theo- argued that natural languages can not be adequately rem 6) that the languages produced by such grammars generated by finite-state Markov processes. Further- are exactly the finite-state languages, accepted (or pro- more, Chomsky and Postal have argued that there are duced) by finite-state automata (or Markov sources). many situations in natural languages where some con- Such languages are also referred to as regular lan- text-sensitive rules are needed for adequate descrip- guages, or one-sided linear languages. In reference 1 tion, and thus generative powers greater than that of theorem 7, Chomsky showed that Type 3 languages are context-free grammars would appear to be required. a proper subset of the Type 2 languages. Those Type These issues are discussed in references 1, 12, 2, and 2 languages which are not Type 3 languages are neces- 11. sarily self-embedding (that is, with derivations A ⇒ This, then, would result in the restriction of the φAjψ), according to Chomsky (reference 1, theorem range of weak generative powers for grammars of 11), and what distinguishes Type 3 languages from natural languages to a probable range between context- arbitrary Type 2 languages is thus the lack of self-em- sensitive and context-free grammars, as is shown on bedding. the spectrum of Figure 1. All regular languages are found to make up a proper But at least one author would disagree with the subset of the linear languages, as shown in reference 4, above placement. In reference 17, the adequacy of a page 369. finite-state model is maintained. Ginsburg and Rice15 have shown that all regular or The question of weak generative power is, of course, 13 GENERATIVE POWERS OF GRAMMARS
  5. interrelationships between sentences are additional fac- only one factor in the determination of proper gram- tors to be considered.2,4,11,12 mars of natural languages. Adequate structural descrip- Received December 1, 1965 tions of sentences and proper characterization of the References 1. Chomsky, Noam. “On Certain 15. Ginsburg, S., and Rice, G. H. 8. Matthews, G. H. “A Note on Formal Properties of Grammars,” “Two Families of Languages Re- Asymmetry in Phrase Structure I nformation and Control, V ol. 2 lated to ALGOL,” J ournal of the Grammars,” Information and Con- (1959), pp. 137-167. Association for Computing Ma- trol, V ol. 7 (1964), pp. 360-365. 2. Chomsky, Noam, and Miller, chinery, Vol. 10 (1962), pp. 350- 9. G ross, Maurice. “On the Equiva- George A. “Introduction to the 371. lence of Models of Language Formal Analysis of Natural Lan- Used in the Fields of Mechanical 16. Ginsburg, Seymour. An Introduc- guages,” in R. R. Bush, E. H. Translation and Information Re- tion to Mathematical Machine Galanter, and R. D. Luce (eds.). trieval,” Information Storage and Theory. Reading, Mass.: Addison- H andbook of Mathematical Psy- Retrieval, Vol. 2, pp. 43-57. New Wesley Publishing Co., 1962. chology, Vol. 2, pp. 269-321. York: Pergamon Press, 1964. 17. Y ngve, V. H. “A Model and an New York: John Wiley & Sons, 10. Gaifman, H. Dependency Systems Hypothesis for Language Struc- 1963. and Phrase Structure Systems. ture,” P roceedings of the Ameri- 3. Davis, Martin. Computability and ( P. 2315.) Santa Monica, Calif.: can Philosophical Society, Vol. Unsolvability. New York: Mc- RAND Corporation. 1961. 104, No. 5 (October, 1960), pp. Graw-Hill Book Co., 1958. 11. Postal, Paul. “Constituent Struc- 444.466. 4. Chomsky, Noam. “Formal Prop- ture.” (Publication 30.) Bloom- 18. Chomsky, Noam, and Miller, erties of Grammars,” in R. R. ington: Indiana University Center George A. “Finitary Models of Bush, E. H. Galanter, and R. D. in Anthropolgy, Folklore, and Language Users,” in R. R. Bush, Luce (eds.). H andbook of Math- Linguistics. (International Jour- E. H. Galanter, and R. D. Luce ematical Psychology, V ol. 2, pp. nal of American Linguistics, Vol. eds.). Handbook of Mathemati- 323-417. New York: John Wiley 30, No. 1 [January 1964]). cal Psychology, V ol. 2, pp. 419- & Sons, 1963. 12. Chomsky, Noam. Syntactic Struc- 491. New York: John Wiley & 5. Kuroda, S. Y. "Classes of Lan-“ tures. The Hague: Mouton & Co., Sons, 1963. guages and Linear-Bounded Auto- 1957. 19. Landweber, P. S. “Three Theories mata,” I nformation and Control, 13. B ar-Hillel, Y., Gaifman, C., and on Phrase Structure Grammars of V ol. 7 (1964), pp. 207-223. Shamir, E. B ulletin of the Re- Type 1,” I nformation and Con- 6. Parikh, R. “Language-Generating search Council of Israel, Sec. F. trol, V ol. 6 (1963). Devices,” MIT Research Labora- Vol. 9, No. 1 (1960). 20. McNaughton, R. “The Theory of tory of Electronics, Quarterly 14. Chomsky, Noam, and Schützen- Automata,” Advances in Com- Progress Report No. 60, Cam- berger, M. P. “The Algebraic puters, V ol. 2. New York: Aca- bridge, January, 1961, pp. 199- Theory of Context-Free Lan- demic Press, 1961. 212. guages,” in P. Braffort and D. 21. Matthews, G. H. “Discontinuities 7. G reibach, S. “Inverses of Phrase Hirschberg (eds.). Computer and Asymmetry in Phrase Struc- Structure Generators.” Ph.D. dis- Programming and Formal Sys- ture Grammars,” Information and sertation, Harvard University, tems, pp. 118-159. Amsterdam: Control, V ol. 6 (1963), pp. 137- June, 1963. North-Holland Publishing Co., 146. 1963. 14 LEA
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2