Database Systems: The Complete Book- P3

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

lượt xem

Database Systems: The Complete Book- P3

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Database Systems: The Complete Book- P3: Database Systems and Database Design and Application courses offered at the junior, senior and graduate levels in Computer Science departments. Written by well-known computer scientists, this introduction to database systems offers a comprehensive approach, focusing on database design, database use, and implementation of database applications and database management systems

Chủ đề:

Nội dung Text: Database Systems: The Complete Book- P3

  1. 176 CHAPTER 4. OTHER DATA JdODELS 4.6. SEAIISTRUCTURED DATA 177 data structures that support efficient answering of queries, as we shall discuss For flexibility in integration, the interface supports semistructured data, and begillning in Chapter 13. the user is allowed to query the interface using a query language that is suitable lret the flexibility of semistructured data has made it important in two for such data. The semistructured data may be constructed by translating the applications. We shall discuss its use in documents in Section 4.7, but here we data at the sources, using components called wrappers (or "adapters") that are shall consider its use as a tool for information integration. As databases have each designed for the purpose of translating one source to semistructured data. proliferated, it has become a common requirement that data in two or more Alternatively, the semistructured data at the interface may not exist at all. of tllem be accessible as if they were one database. For instance, companies Rather, the user queries the interface as if there were semistructured data, while may merge; each has its own personnel database, its own database of sales. the interface answers the query by posing queries to the sources, each referring inventory, product designs, and perhaps many other matters. If corresponding to the schema found at that source. databases had the same schemas, then combining them would be simple; for instance, we could take the union of the tuples in two relations that had the Example 4.27 : \%recan see in Fig. 4.19 a possible effect of information about same schema and played the same roles in the the two databases. stars being gathered from several sources. Notice that the address information for Carrie Fisher has an address concept, and the address is then broken into However, life is rarely that simple. Independently developed databases are street and city. That situation corresponds roughly to data that had a nested- unlikely to share a schema, even if they talk about the same things, such as per- relation schema like Stars(name, a d d r e s s ( s t r e e t , c i t y ) ). sonnel. For instance, one employee database may record spouse-name, another On the other hand, the address information for hiark Hamill has no address not. One may have a way to represent several addresses, phones, or emails concept at all, just street and city. This information may have come from for an employee, another database may allow only one of each. One database a schema such as Stars(name, s t r e e t , city) that only has the ability to might be relational, another object-oriented. represent one address for a star. Some of the other variations in schema that are To make matters more complex, databases tend over time to be used in so not reflected in the tiny example of Fig. 4.19, but that could be present if movie many different applications that it is impossible to shut them down and copy or information were obtained from several sources, include: optional film-type translate their data into another database, even if we could figure out an efficient information, a director, a producer or producers, the owning studio, revenue, way to transform the data from one schema to another. This situation is often and information on where the movie is currently playing. reffwed to as the legacy-database problem; once a database has been in existence for a xt-liile, it becomes impossible to disentangle it from the applications that grow up around it, so the database can never be decommissioned. 4.6.4 Exercises f r Section 4.6 o .4 possible solution to the legacy-database problem is suggested in Fig. 4.20. Exercise 4.6.1 : Since there is no schema to design in the semistructured-data We show two legacy databases with an interface; there could be many legacy model, ~t-ecannot ask you to design schemas to describe different situations. systems involved. The legacy systems are each unchanged, so they can support Rather. in the follo\ving exercises we shall ask you to suggest how particular their usual applications. data might be organized to reflect certain facts. User * a) .idd to Fig. 4.19 the facts that Star Wars was directed by George Lucas and produced by Gary Kurtz. C 0 Interface b) Add to Fig. 4.19 informat,ion about Empire Strikes Back and Return of C) the Jedi, including the facts t,hat Carrie Fisher and Mark Hamill appeared in these movies. .Add to (b) information about the studio (Fox) for these movies and t h e address of the studio (Holly~vood). * Exercise 4.6.2: Suggest llow typical data about banks and customers. as in Exercise 2.1.1. could be represented in the semistructured model. Exercise 4.6.3 : Suggest how typical data about players, teams. and fans, Figure 4.20: Integrating two legacy databases through an interface that sup- as ~vas described in Exercise 2.1.3, could be represented in the semistructured Ports semistructured data Please purchase PDF Split-Merge on to remove this watermark.
  2. 178 CHAPTER 4. OTHER DATA iiODELS 1.7. XiML AXD ITS DATA MODEL 179 Exercise : Suggest how typical data about a genealogy, as was described 4.6.4 semist,ructured data. As m-e shall see in Section 4.7.3, DTD's generally in Exercise 2.1.6, could be represented in the semistructured model. allow more flexibility in the data than does a conventional schema; DTD's often allow optional fields or missing fields, for instance. *! Exercise6 5 : The E/R model and the semistructured-data model are both 4 . . "graphical:' in nature, in the sense that they use nodes, labels, and connections among nodes as the medium of expression. Yet there is an essential difference 4.7.2 Well-Formed XML between the two models. What is it? The niinimal requirement for well-formed XML is that the document begin ~vith a declaration that it is XML, and that it have a root tag surrounding the entire body of the text. Thus, a well-formed XbIL document would have an outer 4 7 XML and Its Data Model . structure like: XML (Extensible Markup Language) is a tag-based notation for "marking" doc- uments, much like the familiar HTML or less familiar SGML. A document is nothing more nor less than a file of characters. However, while HMTL's tags ... talk about the presentation of the information contained in documents - for instance, which portion is to be displayed in italics or what the entries of a list are - XML tags talk about the meaning of substrings within the document. The first line indicates that the file is an XML document. The parameter In this section we shall introduce the rudiments of XML. We shall see t.hat it STANDALONE = "yes" indicates that there is no DTD for this document; i.e., it captures, in a linear form, the same structure as do the graphs of semistructured is a-ell-formed XRIL. Notice that this initial declaration is delineated by special data introduced in Section 4.6. In particular, tags play the same role as did markers . the labels on the arcs of a semistructured-data graph. UTethen introduce the DTD ("document type definition"), which is a flexible form of schema that lye can place on certain documents with XhiIL tags. Carrie Fisher 471 .. Semantic Tags 123Maple %. ~CITY>Hollywood Tags in XML are text surrounded by triangular brackets, i.e., , as in 5 Locust Ln. HIITL. Also as in HThlL, tags generally come in matching pairs, with a be- Malibu ginning tag like and a matching ending tag that is the same word with a slash, like .In HTRL there is an option to have tags with no matching Mark Hamill ender, like for paragraphs, but such tags are not permitted in XhIL. \T,-hen 456 Oak Rd.Brentwood tags come in matching begin-end pairs, there is a requirement that the pairs be nested. That is, between a matching pair and ,t,here can be any Star Wars1977 number of other matching pairs, but if the beginning of a pair is in this range. then the ending of the pair must also be in the range. USTAR-MOVIE-DATA> XLIL is designed to be used in two s o m e ~ h a different modes: t 1. il'ell-formed XR.IL allows you to invent your own tags, much like the arc- Figure 4.21: .In XlIL document about stars and movies labels in semistructured data. This mode corresponds quite closely to semistructured data, in that t,here is no schema, and each document is free to use whatever tags the author of the document 1%-ishes. Example 4.28 : In Fig. 4.21 is an XLIL document that corresponds roughly to the data in Fig. 4.19. The root tag is STAR-MOVIE-DATA. We see two sections 2. Valid XAIL involves a Document Type Definition that specifies the al- surrounded by the tag and its matching .Within each section Ion-able tags arid gives a grammar for how they may be nested. This are subsections giving the name of the star. One: for Carrie Fisher, has two form of SAIL is intermediate between the strict-schema models such as subsections, each giving the address of one of her homes. These sections are the relational or ODL models, and the completely schernaless world of surrounded by an tag and its ender. The section for Mark Hamill Please purchase PDF Split-Merge on to remove this watermark.
  3. 180 CHAPTER 4. OTHER DATA MODELS 4.7. XALL AND ITS DATA MODEL 181 has only entries for one street and one city, and does not use an tag ing tag is STARS (XML, like HTML, is case-insensitive, so STARS is clearly the to group these. This distinction appeared as well in Fig. 4.19. root-tag). The first element definition says that inside the matching pair of tags i\Totice that the document of Fig. 4.21 does not represent the relationship .. .we will find zero or more STAR tags, each representing a :+,tars-inV between stars and movies. We could store information about each single star. It is the * in (STAR*) that says "zero or more," i.e., "any number movie of a st,ar within the section devoted to that star, for instance: Mark Hamill O~~B~~~~WOO~ < Y E A R > ~ ~ ~ ~ < / Y E A R > < / M o v I E > w E~~~~~~~~~ < !ELEMENT STAR (NAME, ADDRESS+, MOVIES)> < !ELEMENT NAME (#PCDATA)> However, that approach leads to redundancy, since all information about the movie is repeated for each of its stars (we have shown no information except a movie's key - title and year - which does not actually represent-an instance of redundancy). We shall see in Section 4.7.5 how XML handles the problem < !ELEMENT MOVIES (MOVIE*)> that tags inherently form a tree structure. 0 < !ELEMENT YEAR (#PCDATA)> 4.7.3 Document Type Definitions In order for a computer to process XML documents automatically, there needs to be something like a schema for the documents. That is, we need to be told what tags can appear in a collection of documents and how tags can be nested. Figure 4.22: 1.1 DTD for movie stars The descriptioll of the schema is given by a grammar-like set of rules, called a document type definition, or DTD. It is intended that companies or communities The second element, STAR, is declared to consist of three kinds of subele- wishing to share dat,a will each create a DTD that describes the form(s) of the ments: NAME, ADDRESS, and MOVIES. They must appear in this order, and each documents they share and establishing a shared view of the semantics of their must be present. Ho~vever, + following ADDRESS says "one or more"; that the tags. Fo; instance, there could be a DTD for describing protein structures, a is, there can be any number of addresses listed for a star, but there must be at DTD for dmcribing t,he purchase and sale of auto parts, and so on. least one. The NAME element is then defined to be *PCD.lTAl7'i.e., simple test. The gross structure of a DTD is: The fourth element says that an address element consists of fields for a street and a city, in that order. < !DOCTYPE root-tag [ Then, the MOVIES element is defined to have zero or more elements of type MOVIE within it; again, the * says "any number of." A MOVIE element is defined more elements to consist of title and year fields, each of which are simple text. Figure 4.23 is 1> an example of a document that conforms to the DTD of Fig. 4.22. o . .. I. The root-tag is used (with its matching ender) to surround a document that .' conforms to the rules of this DTD. An element is described by its name, which is The components of an element E are generally other elements. They must the tagused to surround portions of the document that represent that element, appear between the tags and in the order listed. Horr-ever. there and a parenthesized list of components. The latter are tags that may or must are several operators that control the number of times e1etllent.s appear. appear within the tags for the element being described. The exact requirements on each coniponlent are indicated in a manner we shall see short,lg. 1. A * follorving an element means that the element nlay occur any tiutllbcr There is, however, an important special case. (#PCDATA) after an element of times, including zero t,imes. name means that element has a value that is text, and it has no tags nested 2. A + followingan element means that the element may occur one or more within. times. Exampie 4.29 : In Fig. 4.22 rve see a DTD for stars." The name and surround- 3. A ? following an element nieans that the element may occur either zero 'Sote that the stars-and-movies data of Fig. 4.21 is not intended to conform to this DTD. times or one time, but no more. \ Please purchase PDF Split-Merge on to remove this watermark.
  4. CHAPTER 4. OTHER D.4T)l AZODELS .7. X&IL AND ITS DATA iVIODEL CarrieFisher Example 4.30 : Here is how we might introduce the document of Fig. 4.23 to 123Maple St. assert that it is intended to conform to the DTD of Fig. 4.22. HO~~~WOO~ 5Locust Ln.
  5. 184 CHAPTER 4. OTHER DATA MODELS 4.7. XA4L AND ITS DATA lZlODEL 185 node to street and city nodes. (STARS-MOVIES> (STAR starId = "cf" starredIn = "sw, esb, rj"> < !ELEMENT ADDRESS (STREET, CITY )> Carrie Fisher 123 Maple St. Hollywood SLocust Ln. Malibu movieId ID starsOf IDREFS (STAR starId = "mh" starredIn = "sw, esb, rj"> Mark Hamill 456 Oak Rd. I> Brentwood Figure 4.24: A DTD for stars and movies, using ID'S and IDREF'S Star Wars 1977 Example 4.31 : Figure 4.24 shows a revised DTD, in which stars and movies are given equal status, and ID-IDREF correspondence is used to describe the Empire Strikes Back many-many relationship between movies and stars. Analogously, the arcs be- 1980 tween nodes representing stars and movies describe the same many-many rela- tionship in the semistructured data of Fig. 4.19. The name of the root tag for this DTD has been changed to STARS-MOVIES,and its elements are a sequence Return of the Jedi of stars followed by a sequence of movies. 1983 . star no longer has a set of movies as subelements. as was the case for the 1 DTD of Fig. 4.22. Rather, its only subelements are a name and address. and in the beginning tag we shall find an attribute starredIn whose value is a list of ID'S for the movies of the star. Sote that the attribute starredIn is declared to be of type IDREFS,rather than IDREF. The additional "S" allo~s-s the Figure 4.25: Example of a document following the DTD of Fig. 4.24 value of starredIn to be a list of ID's for movies, rather than a single mot-ie. as would be the case if the type IDREF were used. A tag also has an attribute starId. Since it is declared to be of 4.7.6 Exercises for Section 4.7 type ID: the value of starId may be referenced by tags to indicate the stars of the movie. That is, when we look at the attribute list for MOVIE in Exercise 4.7.1 : Add to the document of Fig. 4.25 the follo~ving facts: Fig. 4.24. we see that it has an attribute movieId of type ID: these are the ID'S that will appear on lists that are the values of starredIn tags. Symmetrically. * a) Harrison Ford also starred in the three movies mentioned and the nio~ie the attribute starsOf of MOVIE is a list of ID's for stars. Witness (1985). Figure 4.25 is an example of a document that conforms to the DTD of b) Carrie Fisher also starred in Hannah and Her Sisters (1985). Fig. 4.24. It is quite similar to the semistrl~ctured data of Fig. 4.19. It includes "Ore data - three movies instead of only one. However, the only structural c) Liam Seeson starred in The Phantom Menace (1999). Please purchase PDF Split-Merge on to remove this watermark.
  6. 186 CHAPTER 4. OTHER DATA MODELS 4.9. ,REFEREhTCESFOR CHAPTER 4 187 * Exercise 4.7.2 : Suggest how typical data about banks and customers, as was major features of object-orientation. These extensions include nested re- described in Exercise 2.1.1, could be represented a s a DTD. lations, i.e., complex types for attributes of a relation, including relations as types. Other extensions include methods defined for these types, and Exercise 4.7.3 : Suggest how typical data about players, teams, and fans, as the ability of one tuple to refer to another through a reference type. was described in Exercise 2.1.3, could be represented as a DTD. + ~emlstructuredData: In this model, data is represented by a graph. Exercise 4.7.4 : Suggest how typical data about a genealogy, as was described Nodes are like objects or values of their attributes, and labeled arcs con- in Exercise 2.1.6, could be represented as a DTD. nect an object to both the values of its attributes and to other objects to which it is connected by a relationship. 4.8 Summary of Chapter 4 + XML: The Extensible Markup Language is a World-Wide-Web Consor- + Object Definition Language: This language is a notation for formally de- tium standard that implements semistructured data in documents (text scribing the schemas of databases in an object-oriented style. One defines files). Nodes correspond to sections of the text, and (some) labeled arcs classes, which may have three kinds of properties: attributes, methods, are represented in XML by pairs of beginning and ending tags. and relationships. + Identifiers and References in XML: To represent graphs that are not trees, + ODL Relationships: A relationship in ODL must be binary. It is repre- XML allows attributes of type I D and IDREF within the beginning tags. sented, in the two classes it connects, by names that are declared to be A tag (corresponding to a node of semistructured data) can thus be given inverses of one another. Relationships can be many-many, many-one, or an identifier, and that identifier can be referred to by other tags, from one-one, depending on whether the types of the pair are declared to be a which we would like to establish a link (arc in semistructured data). single object or a set of objects. + The ODL Type System: ODL allows types to be constructed, beginning 4.9 References for Chapter 4 with class names and atomic types such as integer, by applying any of the following type constructors: structure formation, set-of, bag-of, list-of, The manual defining ODL is [6]. It is the ongoing work of ODLIG, the Object array-of, and dictionary-of. Data Management Group. One can also find more about the history of object- oriented database systems from [4], [5], and [8]. + Extents: A class of objects can have an extent, which is the set of objects of Semistructured data as a model developed from the TSIRIXIIS and LORE that class currently exist,ingin the database. Thus, the extent corresponds projects at Stanford. The original description of the model is in [9]. LORE and to a relation instance in the relational model, while the class declaration its query language are described in [3]. Recellt surveys of work on semistruc- is like the schema of a relation. tured data include [I], [lo], and the book [2]. .A bibliography of semistructured + Keys in ODL: Keys are optional in ODL. One is allo~r-edto declare one data is being compiled on the Web, at [7]. or more keys, but because objects have an object-ID that is not one of its XXIL is a standard developed by the Xorld-\Vide-Web Consortium. The propert,ies, a system implementing ODL can tell the difference between home page for information about XXIL is [Ill. objects, even if they have identical values for all properties. 1. S. Abiteboul, "Querying semi-structured data," Proc. Intl. Conf. on Dnta- + Converting ODL Designs to Relations: If rve treat ODL as only a de- base Theory (1997); Lecture Sotes in Computer Science 1187 (F. Afrati sign language, whose designs are then converted to relations, the simplest and P. Kolaitis, eds.), Springer-Verlag, Berlin, pp. 1-18. approach is to create a relation for a the attributes of a class and a re- lation for each pair of inverse relationships. However. we can combine a 2. Abiteboul, S., D. Suciu, and P. Buneman, Data on the Web: From Rela- many-one relationship with the relation intended for the attributes of the taons to Semistructured Data and Xml, X4organ-Icaufmann, San Francisco, "manyn class. It is also necessary to create new attributes to represent the key of a class that has no key. 3. -4biteboul S., D. Quass, J. McHugh, J. IVidom, and J. L. Weiner, "The + The Object-Relational Model: An alternative to pure object-oriented data- LOREL query language for semistructured data,'' In J. Digital Libraries base models like ODL is to extend the relational model to include the Please purchase PDF Split-Merge on to remove this watermark.
  7. CHAPTER 4. OTHER DATA MODELS 4. Bancilhon, F., C. Delobel, and P. Kanellakis, Building an Object-Oriented Database System, Morgan-Kaufmann, San Francisco, 1992. 5. Cattell, R. G. G., Object Data Management, Addison-Wesley, Reading, MA, 1994. 6. Cattell, R. G. G. (ed.), The Object Database Standard: ODMG-99,XIor- gan-Kaufmann, San Francisco, 1999. 7. L. C. Faulstich,"faulstic/bib/semistruct/ 8. Kim, W. (ed.), Modern Database Systems: The Object Model, Interoper- Relational Algebra ability, and Beyond, ACM press, New York, 1994. 9. Pa.pakonstantinou, Y., H. Garcia-Molina, and idom, om, "Object es- change across heterogeneous information sources," IEEE Intl. Conf. on Data Engineering, pp. 251-260, March 1995. This chapter begins a study of database programming, that is, how the user can ask queries of the database and can modify the contents of the database. Our 10. D. Suciu (ed.) Special issue on management of semistructured data, SIG- focus is on the relational model! and in particular on a notation for describing MOD Record 26:4 (1997). queries about the content of relations called "relational algebra." 11. NJorld-Wide-Web Consortium, h t t p : //www While ODL uses methods that, in principle, can perform any operation on data, and the E/R model does not embrace a specific way of manipulating data, the relational model has a concrete set of "standard" operations on data. Surprisingly, these operations are not "Turing complete" the way ordinary pro- gramming languages are. Thus, there are operations we cannot express in relational algebra that could be expressed, for instance, in ODL methods writ- ten in C++. This situation is not a defect of the relational model or relational algebra, because the advantage of limiting the scope of operations is that it becomes possible to optimize queries written in a very high level language such as SQL, tvhich we introduce in Chapter 6. We begin by introducing the operations of relational algebra. This algebra formally applies to sets of tuples, i.e., relations. Hoxvever, commercial DBkIS's use a slightly different model of relations, which are bags, not sets. That is, relations in practice may contain duplicate tuples. While it is often useful to think of relational algebra as a set algebra, we also need to be conscious of the effects of duplicates on the results of the operations in relational algebra. In the final section of this chapter, n-e consider the matter of how constraints on relations can be expressed. Later chapters let us see the languages and features that today's commercial DBMS's offer the user. The operations of relational algebra are all implemented by the SQL query language, which we study beginning in Chapter 6. These algebraic operations also appear in the OQL language, an object-oriented query language based on the ODL data model and introduced in Chapter 9. 189 Please purchase PDF Split-Merge on to remove this watermark.
  8. 190 CHAPTER 5. RELATIONAL ALGEBRA 5.2. AN ALGEBRA OF RELATION-4L OPER.4TIONS 191 5.1 An Example Database Schema Our schema has five relations. The attributes of each relation are listed, along with the intended domain for that attribute. The key attributes for a relation are shown in capitals in Fig. 5.1, although when we refer to them in As we begin our focus on database programming in the relational model, it is useful to have a specific schema on which to base our examples of queries. Our text, they will be lower-case as they have been heretofore. For instance, all chosen database schema draws upon the running example of movies, stars, and three attributes together form the key for relation S t a r s I n . Relation Movie has six attributes; t i t l e and year together constitute the key for Movie, as studios, and it uses normalized relations similar to the-ones that we developed in Section 3.6. However, it includes some attributes that we have not used pre- they have previously. Attribute t i t l e is a string, and year is an integer. viously in examples, and it includes one relation - MovieExec - that has not The major nlodifications to the schema compared mit,h what we have seen appeared before. The purpose of these changes is to give us some opportunities to study different data types and different ways of representing information. There is a notion of a certificate number for movie executives - studio Figure 5.1 shows the schema. presidents and movie producers. This certificate is a unique integer that we imagine is maintained by some external authority, perhaps a registry of executives or a "union." Movie ( TITLE: s t r i n g , \Ire use certificate numbers as the key for movie executives, although YEAR: integer, movie stars do not al~vays have certificates and we shall continue to use length: integer, name as the key for stars. That decision is probably unrealistic, since incolor: boolean, two stars could have the same name, but we take this road in order to studioName: s t r i n g , illustrate some different options. producerC#: integer) \Ve introduced the producer as another property of movies. This infor- mation is represented by a new attribute, producerC#, of relation Movie. StarsIn( This attribute is intended to be the certificate number of the producer. MOVIETITLE: s t r i n g , Producers are expccted to be moyie executives, as are studio presidents. MOVIEYEAR: i n t e g e r , There may also be other esecutives in the MovieExec relation. STARNAME: s t r i n g ) Attribute f ilmType of Movie has been changed from an enumerat,ed type Moviestar( to a boolean-valued attribute called incolor: true if the movie is in color N M : string, A E and false if it is in black and white. address: s t r i n g , gender : char, The attribute gender has been added for movie stars. Its type is "char- birthdate: date) acter," either M for male or F for female. Attribute birthdate, of type "date" (a special type supported by many commercial database systems HovieExec( =g, or just a character string if we prefer) has also been added. name: s t r i n g , All addresses have been made strings, rather than pairs consisting of a address: s t r i n g , street and city. The purpose is to make addresses in different relations CERT# : integer , comparable easily and to simplify operations on addresses. networth: integer) Studio ( 5.2 An Algebra of Relational Operations - - N M : string, A E address: s t r i n g , TO begin our study of operations on relations. we shall learn about a special presC#: integer) algebra, called relattonal algebra, that consists of some simple but po\ierful nays to construct new relations from given relations. When the giwn relations are stored data, then the constructed relations can be answers to queries about this Figure 5.1: Example database schema about movies \ Please purchase PDF Split-Merge on to remove this watermark.
  9. 192 CHAPTER 5. RELATIONAL ALGEBRA 5.2. AN ALGEBRA OF RELATIOXAL OPERATIONS 193 2. Constants, which are finite relations. Why Bags Can Be More Efficient Than Sets . s we mentioned, in the classical relational algebra, all operands and the results A As a simple example of why bags can lead to implementation efficiency, if of expressions are sets. The operations of the traditional relational algebra fall you take the union of two relations but do not eliminate duplicates, then into four broad classes: you can just copy the relations to the output. If you insist that the result be a set, you have to sort the relations, or do something similar to detect a) The usual set operations - union, intersection, and difference - applied identical tuples that come from the two relations. to relations. b) Operations that remove parts of a relation: "selection" eliminates some rows (tuples), and "projection" eliminates some columns. The development of an algebra for relations has a history, which we shall c) Operations that combine the tuples of two relations, including "Cartesian follow roughly in our presentation. Initially, relational algebra was proposed product," which pairs the tuples of two relations in all possible ways, and by T. Codd as an algebra on sets of tuples (i.e., relations) that could be used various kinds of "join" operations, which selectively pair tuples from two to express typical queries about those relations. It consisted of five operations relations. on sets: union, set difference, and Cartesian product, with which you might already be familiar, and two unusual operations - selection and projection. d) An operation called .'renamingx that does not affect the tuples of a re- To these, several operations that can be defined in terms of these were added: lation, but changes the relation schema, i.e., the names of the attributes varieties of "join" are the most important. and/or the name of the relation itself. When DBMS's that used the relational model were first developed, their query languages largely implemented the relational algebra. However, for ef- IVe shall generally refer to expressions of relational algebra as 9uerie.s. \Yhile ficiency purposes, these systems regarded relations as bags, not sets. That is. we don't yet have the symbols needed to sho~vmany of the expressions of unless the user asked explicitly that duplicate tuples be condensed into one (i.e., relationaj algebra, you should be familiar with the operations of group (a). and that "duplicates be eliminated"), relations were allowed to contain duplicates. thus recognize (R U S) as an esainple of an expression of relational algebra. Thus, in Section 5.3, we shall study the same relational operations on bags and R and S are atomic operands standing for relations, whose sets of tuples are see the changes necessary. unknown. This query asks for the union of whatever tuples are in the relations .inother change to the algebra that was necessitated by commercial imple- named R and S. mentations of the relational model is that several other operations are needed. Nost important is a way of performing aggregation, e.g., finding the average 5.2.2 Set Operations on Relations value of some column of a relation. We shall study these additional operations in Section 5.4. The three most common operations on sets are union. intersection; and differ- ence. \Ye assume the reader is familiar with these operations. n-hich are defined as follo~vs arbitrary sets R and S: on 5.2.1 Basics of Relational Algebra Xn algebra, in general, consists of operators and atomic operands. For in- R U S: the m i o n of R and S; is the set of elements that are in R or S or both. An element appears only once in the union even if it is present in stance, in the algebra of arithmetic, the atomic operands are variables like .r both R and S. and constants like 15. The oDerators are the usual arithmetic ones: addition. subtraction, multiplication, and division. Any algebra allows us to build ez- R n S ? the in,ter.sectionof R and S . is the set of elelilents that are in both pressions by applying operators to atomic operands and/or other expressiolls R and S . of the algebra. Usually, parentheses are needed to group operators and their operands. For instance, in arithmet,ic we have expressions such as (x + y) * z or R - S , the difference of R and S , is the set of elements that are in R but ((x + 7)/(y - 3)) x. + not in S . Sote that R - S is different froni S - R; the latter is the set of Relational algebra is another example of an algebra. Its atomic operallds elements that are in S but not in R. are: When we apply these operations to relations, tve need to put some conditions 1. Variables that stand for relat,ions. Please purchase PDF Split-Merge on to remove this watermark. ., .,
  10. 194 CHAPTER 5. RELATIONAL ALGEBR-4 5.2. AN ALGEBRA OF REL-4TION4L OPERATIONS 195 1 R and S must have schemas with identical sets of attributes, and the . Xow, only the Carrie Fisher tuple appears, because only it is in both relations. types (domains) for each attribute must be the same in R and S. The difference R - S is 2. Before me compute the set-theoretic union, intersection, or difference of name I address I gender I birthdate sets of tuples, the columns of R and S must be ordered so that the order Mark Hamill 1 456 Oak Rd. , Brentwood ( M ( 8/8/88 of attributes is the same for both relations. That is, the Fisher and Hamill tup!es appear in R and thus are candidates for R - S. Horn-ever: the Fisher tuple also appears in S and so is not in R - S. Sometimes we would like to take the union, intersection, or difference of relations that have the same number of attributes, with corresponding domains. but that use different names for their attributes. If so, we may use the renaming 5.2.3 Projection operator to be discussed in Section 5.2.9 to change the schema of one or both The projection operator is used to produce from a relation R a new relation relations and give them the same set of attributes. that has only some of R's columns. The value of expression ~ T A ~ , . ~ ~ (R). isA ~ ,.. , a relation that has only the columns for attributes A1, A2,. . . ,A, of R. The schema for the resulting value is the set of attributes {Ax, -42,. . . ,A,), which name address gender birthdate we conventionally show in the order listed. Carrie Fisher 123 Maple S t . , Hollywood F 9/9/99 Mark H a i l 1 456 Oak Rd., Brentwood M 8/8/88 title year length incolor studioName producerC# Relation R S t a r Wars 1977 124 true Fox 12345 Mighty Ducks 1991 104 true Disney 67890 Wayne's World 1992 95 true Paramount 99999 name address gender birthdate Carrie Fisher 123 Maple S t . , Hollywood F 9/9/99 Harrison Ford 789 Palm Dr., Beverly H i l l s M 7/7/77 Figure 3.3: The relation Movie Relation S Example 5.2 : Consider the relation Movie with the relation schema described in Section 5.1. - 1 instance of this relation is shown in Fig. 5.3. We can project 11 Figure 5.2: TIYOrelations this relation onto the first three attributes with the expression 7 10 title.year.length (Movie) Example 5.1 : Suppose we have the two relations R and S: instances of the The resulting relation is relation Moviestar of Section 5.1. Current instances of R and S are shon-n in Fig. 5.2. Then the union R U S is title I year 1 length name address gender birthdate Carrie Fisher 123 Maple S t . , Hollywood F 9/9/99 Mark Harnill 456 Oak Rd., Brentwood M 8/8/88 -1s another example. n-e can project onto the attribute i n c o l o r xith the Harrison Ford 789 Palm Dr., Beverly H i l l s M 7/7/77 expression ;ii,lc,rc.,(Movie). The result is the single-column relation Sote that the two tuples for Carrie Fisher from the two relations appear only inColor once in the result. true The intersection R n S is Sotice that there is only one tuple in the resulting relation, since all three tuples name 1 address 1 gender I birthdate of Fig. 5.3 have the same value in their component for attribute i n c o l o r , and C a r r i e Fisher 1 123 Maple S t . , Hollywood IF in the relational algebra of sets, duplicate tuples are always eliminated. 0 Please purchase PDF Split-Merge on to remove this watermark.
  11. 196 CHAPTER 5. RELATIONAL ALGEBRA 5.2. AN ALGEBRA OF RELATIOArS4LOPERATIOh*S 197 5.2.4 Selection 5.2.5 Cartesian Product The selection operator, applied to a relation R, produces a new relation with a The Cartesian product (or cross-product, or just product) of two sets R and subset of R's tuples. The tuples in the resulting relation are those that satisfy S is the set of pairs that can be formed by choosing the first element of the some condition C that involves the attributes of R. We denote this operation pair to be any element of R and the second any element of S. This product uc(R). The schema for the resulting relation is the same as R's schema, and is denoted R x S . When R and S are relations, the product is essentially the we conventionally show the attributes in the same order as we use for R. same. However, since the members of R and S are tuples, usually consisting C is a conditional expression of the type with which we are familiar from of more than one component, the result of pairing a tuple from R with a tuple conventional programming languages; for example, conditional expressions fol- from S is a longer tuple, with one component for each of the components of low the keyword i f in programming languages such as C or Java. The only the constituent tuples. By convention, the components from R precede the difference is that the operands in condition C are either constants or attributes components from S in the attribute order for the result. of R. We apply C to each tuple t of R by substituting, for each attribute rl The relation schema for the resulting relation is the union of the schemas appearing in condition C, the component of t for attribute A. If after substi- for R and S. However, if R and S should happen to have some attributes in tuting for each attribute of C the condition C is true, then t is one of the tuples common, then we need to invent new names for at least one of each pair of that appear in the result of uc(R); otherwise t is not in the result. identical attributes. To disambiguate an attribute A that is in the sclemas of both R and S , we use R..4 for the attribute from R and S.A for the attribute Example 5.3: Let the relation Movie be as in Fig. 5.3. Then the wlue of from S . expression ul,,,th2~oo(Movie) is title year length incolor studioName producerC# Star Wars 1977 124 true Fox 12345 Mighty Ducks 1991 104 true Disney 67890 Relation R The first tuple satisfies the condition length 2 100 because when we substitute for length the value 124 found in the component of the first tuple for attribute length, the condition becomes 124 2 100. The latter condition is true, so xe accept the first tuple. The same argument explains why the second tuple of Fig. 5.3 is in the result. The third tuple has a length component 95. Thus, when we substitute for length n-e get the condition 95 2 100, which is false. Hence the last tuple of Fig. 5.3 is not in the result. 0 Relation S Example 5.4: Suppose we want the set of tuples in the relation Movie that represent Fox movies at least 100 minut,es long. We can get these tuples with a more complicated condition, involving the AND of two subconditions. The expression is fllength>lOO AND studioName='FoxJ The tuple Result R x S title 1 year 1 length I inColor ] studioName 1 producerC# Star Wars 1 1977 ( 124 1 t r u e 1 Fox Figure 5.3: Tn-o relations and their Cartesian product, is the only one in the resulting relation. Please purchase PDF Split-Merge on to remove this watermark.
  12. 198 CHAPTER 5. RELATIONAL A L G E B m 5.2. A X ALGEBRA OF RELATIOX-4L OPERATIOlW 199 Example 5.5 : For conciseness, let us use an abstract example that illustrates Example 5.6: The natural join of the relations R and S from Fig. 5.4 is the product operation. Let relations R and S have the schemas and tuples shown in Fig. 5.4. Then the product R x S consists of the six tuples shown in that figure. Note how we have paired each of the two tuples of R with each of the t,hree tuples of S. Since B is an attribute of both schemas, we have used R.B and S.B in the schema for R x S. The other attributes are unambiguous, and their names appear in the resulting schema unchanged. The only attribute common to R and S is B. Thus, to pair successfully, tuples need only to agree in their B components. If so, the resulting tuple has corn- ponents for attributes A (from R), B (from either R or S), C (from S ) , and D 5.2.6 Natural Joins In this example, the first tuple of R successfully pairs with only the first More often than we want to take the product of two relations, we find a need to tuple of S ; they share the value 2 on their common attribute B. This pairing join them by pairing only those tuples that match in some way. The sinlplest ~ields the first tuple of the result: (1,2,5,6). The second tuple of R pairs sort of match is the natural join of t~vo relations R and S , denoted R w S, in successfully only with the second tuple of S, and the pairing yields (3,4,7,8). which we pair only those tuples from R and S that agree in whatever attributes Note that the third tuple of S does not pair with any tuple of R and thus has are common to the schenlas of R and S. More precisely, let A1, A2, . ..,A, be 1 0 effect on the result of R w S . X tuple that fails to pair n-it11 any tuple of 1 all the attributes that are in both the schema of R and the schema of S. Then the other relation in a join is said a dangling tuple. 0 a tuple r from R and a tuple s from S are successfully paired if and only if r and s agree on each of the attributes ill, A*, . ..,A,. Example 5.7: The previous exalnple does not illustrate all the possibilities If the tuples r and s are successfully paired in the join R w S, then the inherent in the natural join operator. For example, no tuple paired successfully result of the pairing is a tuple, called the joined tuple, with one component for with more than one tuple. and there was only one attribute in common to the each of the attributes in the union of the schemas of R and S. The joined tuple two relation schemas. In Fig. 5.6 we see two other relations, Ci and I;, that share agrees with tup!e r in each attribut,e in t.he schema of R, and it agrees with s tu-o attributes between their schcmas: B and C. We also show an instance in in each attribute: i r ~ schema of S. Since r and s are successfully paired, the the which one tuple joins with se~eral tuples. joined tuple is able to agree with both these tuples on the attributes they have For tuples to pair successfully, they must agree in both the B and C conl- in common. The construction of the joined tuple is suggested by Fig. 5.5. ponents. Thus, the first tuple of C joins with the first t~vo tuples of I ,tvhile ' the second and third tuples of li join with the third tuple of I-. The result of R these four pairings is shown in Fig. 3.6. 0 5.2.7 Theta-Joins The natural join forces us t,o pair tuples using one specific condition. 1l7hilethis vay, equating shared attributes, is the most common basis on n-hich relations are joined, it is sometinles desirable to pair tuples from two relations on some other basis. For that purpose, we have a related notation called the theta- join. Historically the "theta" refers to an arbitrary condition. which ~ve~shall represent by C rather than 0. The notation for a theta-join of relations R and S based on condition C is Figure 3.5: Joining tuples 7 R S . The result of this operation is constructed as follo~vs: Sate also that this join operation is the same one that Ire used in Scc- 1. Take the product of R and S. tion 3.6.5 to recombine relations that had been project,ed onto two subsets of 2. Select frorn the product only those tuples that satisfy the condition C. their attributes. There the motivation was to explain why BCNF decomposi- tion made sense. In Section 5.2.8 we shall see another use for t,he natural join: As with the product operation, the schema for the result is the union of the combining two relations so that we can write a query t,hat relates attributes of schemas of R and S. with "R," or "S." prefised to attributes if necessary to each. indicate from which schema the attribute came. Please purchase PDF Split-Merge on to remove this watermark.
  13. CHAPTER 5. RELATIONAL ALGEBR.4 5.2.. AN ALGEBRA OF RELATIONAL OPERATIOIW 201 Relation U Figure 5.7: Result of U ATDV Example 5.9 : Here is a theta-join on the same relations U and V that has a more complex condition: Relation V * A
  14. - 202 CHAPTER 5. RELATIONAL ALGEBRA 5.2. AN ALGEBRA OF RELATION-4L OPERATIOXS 203 2. Select those Movies tuples that have studioiVame = 'Fox'. Equivalent Expressions and Query Optimization 3. Compute the intersection of (1) and (2). All database systems have a query-answering system, and many of them 4. Project the relation from (3) onto attributes t i t l e and year. are based on a language that is similar in expressive power to relational algebra. Thus, the query asked by a user may have many equivalent expres- sions (expressions that produce the same answer, whenever they are given the same relations as operands), and some of these may be much more quickly evaluated. An important job of the query "optimizer" discussed briefly in Section 1.2.5 is to replace one expression of relational algebra by an equivalent expression that is more efficiently evaluated. Optimization of relational-algebra expressions is covered extensively in Section 16.2. Moviesl with schema { t i t l e , year, length, filmType, studioName) Movies2 with schema { t i t l e , year, starName) Movies Movies Let us write an expression to answer the query "Find the stars of movies that Figure 5.8: Expression tree for a relational algebra expression are at least 100 minutes long." This query relates the starName attribute of Movies2 with the length attribute of Moviesl. \Ire can connect these attrihutes In Fig. 5.8 we see the above steps represented as an expression tree. The by joining the two relations. The natural join successfi~llypairs only those tuples two selection nodes correspond to steps (1) and (2). The intersection node that agree on t i t l e and year: that is, pairs of tuples that refer to the same corresponds to step (3), and the projection node is step (4). movie. Thus, Moviesl w Movies2 is an expression of relational algebra that Alternatively, we could represent the same expression in a conventional. produces the relation we called Movies in Esample 3.24. That relation is the linear notation, with parentheses. The formula non-BCNF relation whose schema is all sis attributes and that contains several tuples for the same movie when that movie has several stars. To the join of Moviesl and Movies2 Ive must apply a selection that enforces the condition that the length of the movie is at least 100 minutes. \ire then represents the same expression. project onto the desired attribute: starName. The expression Incidentally, there is often more than one relational algebra expression that represents the same computation. For instance, the above query could also be written by replacing the intersection by logicd AND within a single selection operation. That is, implements the desired query in relational algebra. Ttitle.yea~(glength>1oo AND PoxJ( ~ o v i e s ) ) 5.2.9 Renaming is an equivalent form of the query. In order to control the names of the attrihutes used for relations that are con- structed by applying relational-algebra operations, it is often convenient to Example 5.11 : One use of t,he natural join operation is to recombine relations use an operator that explicitly renames relations. We shall use the operator that were decomposed to put them into BCNF. Recall the decomposed relations PS(A~,A~,...,A,)(R)to rename a relation R. The resulting relation has exactly from Example 3.24:l the same tuples as R, but the name of the relation is S. lloreover, the at- ernem ember that the relation Movies of that example has a somewhat different relation tributes of the result relation S are named dl: . . ,.A,? in --Iz,. order from the schema from the relation Movie that we introduced in Section 5.1 and used in Examples 5.2, left. If we only want to change the name of the relation to S and leave the 5.3, and 5.4. attributes as they are in R, we can just say ps(R). Please purchase PDF Split-Merge on to remove this watermark.
  15. 204 CHAPTER 5. RELATIONAL ALGEBRA 5.2. AN ALGEBRA OF RELATIOXAL OPERATIONS 205 Example 5.12 : In Example 5.5 we took the product of two relations R and s .is an alternative, we could take the product without renaming, as we did in from Fig. 5.4 and used the convention that when an attribute appears in both 5.5, and then rename the result. The expression PRS(A,B,X,C.D)(R) xS operands, it is renamed by prefixing the relation name to it. These relations R ields the same relation as in Fig. 5.9, with the same set of attributes. But this and S are repeated in Fig. 5.9. &tion has a name, RS, while the result relation in Fig. 5.9 has no name. O Suppose, howetrer, that we do not wish to call the two versions of B by names R.B and S.B; rather we want to continue to use the name B for the 5.2.10 Dependent and Independent Operations attribute that comes from R, and we want to use X as the name of the attribute B coming from S. ?Ve can reriame the attributes of S so the first is called x. Some of the operations that we have described in Section 5.2 can be expressed The result of the expression p s ( x , c , ~ ) ( S )a relation named S that looks just is in terms of other relational-algebra operations. For example, intersection can like the relation S from Fig. 5.4, but its first column has attribute X instead be expressed in terms of set difference: of B. RnS=R-(R-S) That is, if R and S are any two relations with the same schema, the intersection of R and S can be computed by first subtracting S from R to form a relation T consisting of all those tuples in R but not S . TVe then subtract T from R, leaving only those tuples of R that are also in S. Relation R The two forms of join are also expressible in terms of other operations. Theta-join can be expressed by product and selection: R 7 S =uc(Rx S) The natural join of R and S can be expressed by starting with the product R x S . n'e then apply the selection operator with a condition C of the form Relation S R..A1 = S.Al AND = S..A2 AND. . . AND R.& = s.-& \\-here .AI: A2:.. . ,'4, are all the attributes appearing in the schemas of both R and S. Finally, we must project out one copy of each of the equated attributes. Let L be the list of attributes in the schema of R follo~\-ed those attributes by in the schema of S that are not also in the schema of I?. Then RW s = r L ( u c ( x s)) ~ Example 5.13: The natural join of the relations U and V from Fig. 5.6 can be witten in terms of product, selection, and projection as: ~ e s u l R x Ps(.Y,c,D) t (s) r.asa.c,o(gu.B=t.e AND r..c=t:c(~~ x 1;)) Figure 5.9: Renaming before taking a product That is. \\-e take the product C x I,-. Then we select for equality between each pair of attributes \vith the same name -- B and C in this example. Finall>-. we project onto all the attributes except one of the B's and one of the C's: xve When 11-e take the product of R with this nex relation, there is no conflict have chosen to eliminate the attributes of 1- whose names also appear in the of names among the attributes, so no further renaming is done. That is, the schema of U. of the expression R x ~ s ( x , c , ~ )is S ) relation R x S from Fig. 5.4. ( the For another example, the theta-join of Example 5.9 can be n-ritten that the five columns are labeled A, B, S, , and D,froln the left. This C relation is shown in Fig. 5.9. U.A
  16. 206 CHAPTER 5. RELATIONAL ALGEBRA 5.2. AN ALGEBRA OF RELATIONAL OPERATIONS That is, we take the product of the relations U and V and then apply the 5.2.12 Exercises for Section 5.2 condition that appeared in the theta-join. Exercise 5.2.1 : In this exercise we introduce one of our running examples of The rewriting rules mentioned in this section are the only "redundancies" a relational database schema and some sample data.2 The database schema among the operations that we have introduced. The six remaining operations - consists of four relations, whose schemas are: unio11, difference, selection, projection, product, and renaming - form an in- dependent set, none of which can be written in terms of the other five. product (maker, model, type) PC(mode1, speed, ram, hd, rd, p r i c e ) 5.2.11 A Linear Notation for Algebraic Expressions ~aptop(mode1,speed, ram, hd, screen, p r i c e ) Printer (model, c o l o r , type, p r i c e ) In Section 5.2.8 we used trees to represent complex expressions of relational algebra. another alternative is to invent names for the temporary relations that The Product relation gives the manufacturer, model number and type (PC, correspond to the interior nodes of the tree and write a sequence of assignments laptop, or printer) of various products. We assume for convenience that model that create a value for each. The order of the assignments is flexible, as long numbers are unique over all manufacturers and product types; that assumption as the children of a node N have had their values created before we attempt to is not realistic, and a real database would include a code for the manufacturer create the value for N itself. as part of the model number. The P relation gives for each model number C The notation we shall use for assignment statements is: that is a PC the speed (of the processor, in megahertz), the amount of RAM 1. A relation name and parenthesized list of attributes for that relation. The (in megabytes), the size of the hard disk (in gigabytes), the speed and type name Answer will be used conventionally for the result of the final step: of the removable disk (CD or DVD), and the price. The Laptop relation is i.e.; the name of the relation a t the root of the expression tree. similar, except that the screen size (in inches) is recorded in place of information about the removable disk. The Prinzer relation records for each printer model 2. The assignment symbol :=. whether the printer produces color output (true. if so), the process type (laser, 3. .4ny algebraic expression on the right. We can choose to use only one ink-jet. or bubble), and the price. operator per assignment, in which case each interior node of the tree gets Some sample data for the relation Product is shown in Fig. 5.10. Sample its own assignment statement. However, it is also permissible to conibine data for the other three relations is shown in Fig. 5.11. Manufacturers and several algebraic operations in one right side, if it is convenient to do so. model numbers haye been "sanitized," but the data is typical of products on sale a t the beginning of 2001. Example 5.14: Consider the tree of Fig. 5.8. One possible sequence of as- Write expressions of relational algebra to answer the follo~ving queries. You signments to evaluate this expression is: may use the linear notation of Section 5.2.11 if you wish. For the data of Figs. R ( t , y , l , i , s , p ) := ~len~th>loo(Movie) 5.10 and 3.11, show the result of your query. However, your answer should work S ( t ,y,l , i , s s p ) := UstudioNarne=~fax' (Movie) for arbitrary data, not just the data of these figures. T ( t , y , l , i . s . p ) := R n S Answer(title, year) := s t , F o x L byte? Notice that we get renaming "for free," since we can use any attributes and relation name we wish for the left side of an assignment. The last two steps c) Find the model nunlber and price of all products (of ally type) made by compute the intersection and the projection in the obvious way. manufacturer B. It is also permissible to combine some of the steps. For instance, we could combine the last two steps and write: d) Find the model numbers of all color laser printers. R(t , Y ,1 , i ,s ,p) := u,ength2100 - - (Movie) e) Find those manufacturers that sell Laptops. but not PC's. S ( t ,y , l ,i ,S ,p) := (TstudioName='~ox' (Movie) Answerctitle, year) := T ~ , ~ (n S) R *! f) Find those hard-disk sizes that occur in two or more PC's. 'Source: manufacturers' \Veb pages and Please purchase PDF Split-Merge on to remove this watermark.
  17. CHAPTER 5. RELATIONAL ALGEBRA ! ;. 7 . IALGEBRA OF RELATION.4L OPERATIONS -N 209 model ( speed / r a m I hd I rd I price 1001 1 700 1 64 1 10 1 48xCD 1 799 maker model type A 1001 PC A 1002 PC A 1003 PC A 2004 laptop A 2005 laptop A 2006 laptop B 1004 PC B 1005 PC B B B 1006 2001 2002 PC laptop laptop E i (a) Sample data for relation PC B 2003 laptop model 1 speed ram hd screen 1 price C 1C07 PC 2001 1 700 64 5 12.1 1 1448 C 1008 ' pc 2002 800 96 10 15.1 2584 C 2008 l a p t o p 2003 850 64 10 15.1 2738 C 2009 l a p t o p 2004 550 32 5 12.1 999 C 3002 p r i n t e r 2005 600 64 6 12.1 2399 C 3003 p r i n t e r 2006 800 96 20 15.7 2999 C 3006 p r i n t e r 2007 850 128 20 15.0 3099 D 1009 PC 2008 650 64 10 12.1 1249 D 1010 PC 2009 750 256 20 15.1 2599 D 1011 PC 2010 366 64 10 12.1 1499 D 2007 l a p t o p E 1012 PC (b) Sample data for relation Laptop E 1013 PC E 2010 l a p t o p model color tgpe price F 3001 p r i n t e r F 3004 p r i n t e r 3001 true ink-jet 231 G 3005 p r i n t e r 3002 true ink-jet 267 H 3007 p r i n t e r 3003 false laser 390 3004 true ink-jet 439 3005 true bubble 200 Figure 5.10: Sample data for Product 3006 true laser 1999 3007 false laser 350 (c) Sample data for relation P r i n t e r %9 F &; b Figure 5.11: Sample data for relations of Exercise .5.2.1 Please purchase PDF Split-Merge on to remove this watermark.
  18. 8 CHAPTER 5. RELATIONAL ALGEBRA J 4 5.2. AN ALGEBRA OF RELATIOX4L OPERATIONS 211 210 q i ! g) Find those pairs of P C models that have both the same speed and R.A)I. .pair should be listed only once; e.g., list (i,j) but not (j,i). i c1as.r - type country bore - displacement *!! h) Find those manufacturers of at least two different computers (PC's or "i Bismarck bb Germany 15 42000 laptops) with speeds of a t least 700. $ Iowa bb USA 16 46000 Kongo bc Japan 14 32000 !! i) Find the manufacturer(s) of the computer (PC or laptop) with the highest North C a r o l i n a bb UA S 16 37000 available speed. Renown bc G t . Britain 15 32000 Revenge bb G t . Britain 15 29000 !! j) Find the manufacturers of PC's with a t least three different speeds. Tennessee bb UA S 14 32000 !! k) Find the manufacturers who sell exactly three different models of PC. Y amat o Japan 18 65000 I bb Exercise 5.2.2: Draw expression trees for each of your expressions of Exer- cise 5.2.1. (a) Sample data for relation Classes Exercise 5.2.3: Write each of your expressions from Exercise 5.2.1 in the linear notation of Section 5.2.11. Exercise 5.2.4 : This exercise introduces another running example, concerning North Cape 12/26/43 World War I1 capital ships. It involves the following relations: C l a s s e s ( c l a s s , t y p e , country, numGuns, bore, displacement) Ships(name, c l a s s , launched) (b) Sample data for relation B a t t l e s B a t t l e s (name, d a t e ) Outcomes(ship, b a t t l e , r e s u l t ) ship I battle result sunk Ships are built in "classes" from the same design, and the class is usually named California Surigao S t r a i t ok for the first ship of that class. The relation C l a s s e s records the name of thr UUI ok class, the type (bb for battleship or bc for battlecruiser), the country that built =.d O r. , u Surigao S t r a i t sunk the ship, the number of main guns, the bore (diameter of the gun barrel, in North A t l a n t i c sunk inches) of the main guns, and the displacement (weight, in tons). Relation King George V North A t l a n t i c ok Ships records the name of the ship, the name of its class, and the year in which Kirishima Guadalcanal sunk - the ship was launched. Relation B a t t l e s gives the name and date of battles "--ince of Wales rr North A t l a n t i c damaged involving these ships, and relation Outcomes gives the result (sunk, damaged. "A. nudney North A t l a n t i c ok or ok) for each ship in each battle. . bcnarnnorsc - 3 L x, . . L.- I V O ~ C I IL, p e "~ sunk Figures 5.12 and 5.13 give some sample data for these four relation^.^ Sote 1 - that. unlike the data for Exercise 5.2.1. there are some "daneline tnnlrs" in this - -" data. e.g., ships mentioned in Outcomes that are not mentioned in Ships. -r--- --- ,,.- K U L ~ .-. c___L1_ JOULII _.. U ~ Tennessee uuauar~anal ,-...-A,,,.- Surigao S t r a i t I damaged ok ok Write expressions of relational algebra t o answer the following queries. For - - the data of Figs. 5.12 and 3.13, show the result of your query. However: your Washington I 1 a Yamashiro I Guadalcanal dur r g - v S t r a i t c,.,in=n Surigao r u a L * I c.*--; * I I O* -..-I. nu=. answer should work for arbitrary data, not just the dat,a of thcse figures. a) Give the class names and countries of the classes that carried guns of at ( c ) Sample data for relation Outcomes least 16-inch bore. 3Source: J. S . \Vestwood, Fighting Ships of World War I], Follett Publishing, Chicago. 1976 and R. C . Stern, US Battleships in Action, Squadron/Signal Publications, Carrollton. Figure 3.12: Data for Exercise 5.2.4 TS. 1980. Please purchase PDF Split-Merge on to remove this watermark.
  19. 212 CHAPTER 5. RELATIOhrAL A LGEBR.4 5.2. A N =tLGEBRd OF RELATIOATAL OPERATIONS 213 Exercise 5.2.5 : Draw expression trees for each of your expressions of Exer- name 1 class I launched cise 5.2.4. California ( Tennessee 1 1921 Haruna Kongo Exercise 5.2.6: Write each of your expressions from Exercise 5.2.4 in the Hiei Kongo linear notation of Section 5.2.11. Iowa Iowa Kirishima Kongo Exercise 5.2.7: What is the difference bet~veen natural join R w S and the the Kongo Kongo theta-join R S where the condition C is that R.d = S . 4 for each attribute Hissouri Iowa A appearing in the schemas of both R and S? Musashi Yamato Exercise 5.2.8 : ;In operator on relations is said to be monotone if whenever Nw Jersey e we add a tuple to one of its arguments, the result contains all the tuples that Worth Carolina it contained before adding the tuple, plus perhaps more tuples. Which of the Ramillies operators described in this section are monotone? For each, either explain why Renown Repulse Resolution Renown Renown I Revenge 11 1916 1916 1916 it is monotone or give an example showing it is not. Exercise 5.2.9: Suppose relations R and S have n tuples and m tuples, re- Revenge Royal Oak I Revenge Revenge Royal Sovereign Revenge spectively. Give the minimum and maximum numbers of tuples that the results of the follo~vingexpressions can hare. Tennessee Tennessee Washington North Carolina Wisconsin Iowa Yamato Yamato c) uc(R) x S: for sorne condition C. d) vr.(R) - S : for sorne list of attributes L. Figure 5.13: Sample data for relation Ships Exercise 5.2.10: The semijoin of relatioils R and S, written R D
  20. 214 CH'4PTER 5. RELATIONAL ALGEBR-4 5.3. RELATIOiVAL OPERATIOW ON BAGS 5.3 Relational Operations on Bags \vhile a set of tuples (i.e., a relation) is a simple, natural model of data as it might appear in a database, commercial database systems rarely, if ever, are based purely on sets. In some situations, relations as they appear in database systems are permitted to have duplicate tuples. Recall that if a "set" is allon-ed to haye multiple occurrences of a member, then that set is called a bag or muftiset. In this section, nre shall consider relations that are bags rather than sets; that is, we shall allow the same tuple to appear more than once in a Figure 5.15: Bag for Example 5.16 relation. When we refer to a "set," we mean a relation without duplicate tuples; a "bag" means a relation that may (or may not) have duplicate tuples. we used the ordinary projection operator of relational algebra, and therefore eliminated duplicates, the result would be only: Example 5.15: The relation in Fig. 5.14 is a bag of tuples. In it, the tuple (1,2) appears three times and the tuple (3,4) appears once. If Fig. 5.14 were a set-valued relation, we would have to eliminate two occurrences of the tuple (1,2). In a bag-valued relation, we do allow multiple occurrences of the same tuple, but like sets, the order of tuples does not matter. Sote that the bag result, although larger, can be computed more quickly, since there is no need to compare each tuple (1,2) or (3,4) with previously generated tuples. Lloreover. if we are projecting a relation in order to take an aggregate (dis- cussed in Section 5.4). such as "Find the average value of .-I Fig. 5.15." we in could not use the set model to think of the relation projected onto attribute -4. -4s a set, the average value of -4 is 2. because there are only two values of A - 1 and 3 - in Fig. 5.15. and their average is 2. However. if we treat the -4-column in Fig. 5.15 as a bag ( we get the correct average of '4. which is 1.5, Figure 5.14: A bag among the four tuples of Fig. 5.15. 5.3.2 Union, Intersection, and Difference of Bags 5.3.1 Why Bags? When xve take the union of tn-o bags, we add the nunlber of occurrences of each Khen we think about implementing relations efficiently, we can see several rvays tuple. That is, if R is a bag in n-hich the tuple t appears n times, and S is a bag that allowing relations to be bags rather than sets can speed up operations on in which the tuple t appears m times, then in the bag R U S tuple t appears relations. We mentioned at the beginning of Section 5.2 how allowing the result n f m times. Sote that either n or m (or both) can be 0. to be a bag coulcl speed up the union of two relations. For another example. IYlen ~ v eintersect two bags R and S, in \vhich tuple t appears n and when ~ v e a projection, allowing the resulting relation to be a bag (even I\-lien do m times, respectively. in R n S tuple t appears min(n, m) times. f hen we the original relation is a set) lets us work with each tuple indepcndent.1~. \YO If compute R - S. the difference of bags R and S : tuple t appears in R - S ~vanta set as the result, we need to compare each projected tuple with all thc mas(0,r. - m ) times. That is. if t appears in R more times than it appears in other projected tuples, to make sure that each projection appears only oncc. S. then in R - S tuple t appears the number of times it appears in R. minus the However, if we can accept a bag as the result, then we simply project each tuple number of ti~nes appears in 5'. Ho~vever:if t appears at least as many times it and add it to the result; no comparison with other projected tuples is necessary. in S as it appears in R. then t does not appear at all in R - S. Intuitively, occurrences of t in S each "cancel" one occurrence in R. Example 5.16: The bag of Fig. 5.14 could be the result of project,ing the relation shown in Fig. 5.15 onto attributes -4 and B, provided vie allow the Example 5.17: Let R be the relation of Fig. 5.14, that is, a bag in which result to be a bag and do not eliminate the duplicate occurreIices of (1,2). Had tuple (1,2) appears three times and (3.4) appears once. Let S be the bag Please purchase PDF Split-Merge on to remove this watermark.



Đồng bộ tài khoản