Database Systems: The Complete Book- P12

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:20

lượt xem
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Database Systems: The Complete Book- P12: Database Systems and Database Design and Application courses offered at the junior, senior and graduate levels in Computer Science departments. Written by well-known computer scientists, this introduction to database systems offers a comprehensive approach, focusing on database design, database use, and implementation of database applications and database management systems

Chủ đề:

Nội dung Text: Database Systems: The Complete Book- P12

  1. 1080 CHAPTER 20. IXFORlI.4TION IXTEGRATIO-\- it appears. Figure 20.17 suggests the process of adding a border to the cube useful data cube would replace the serial number by the t x o attributes - model in each dimension, to represent the * u l u e and the aggregated values that and color - to which the serial number connects S a l e s via the dimension table it implies. In this figure u;e see three din~ensions.with the lightest shading Autos. Sotice that if we replace s e r i a l N o by model and c o l o r , then tile cube representing aggregates in one dimension, darker shading for aggregates over no longer has a key among its dimensions. Thus, an entry of the cube ~vould two dimensions, and tlle darkest cube in the corner for aggregation over all hare the total sales price for all automobiles of a given model. with a given three dimensions. Notice that if the number of values along each dirnension is color, by a given dealer, on a given date. reasonably large, but not so large that most poil~ts tlle cube are unoccupied. in There is another change that is useful for the data-cube implementation then the "border" represents only a small addition to the volume of the cube of the S a l e s fact table. Since the C U B E operator normally sums dependent (i.e., the number of tuples in the fact table). In that case, the size of the stored variables, and 13-e might want to get average prices for sales in some category, data C C B E ( Fis not much greater than tlic size of F itself. ) n-e need both the sum of the prices for each category of automobiles (a given model of a given color sold on a given day by a given dealer) and the total number of sales in that category. Thus, the relation S a l e s to which we apply the C C B E operator is Sales(mode1, c o l o r , d a t e , d e a l e r , v a l , c n t ) The attribute v a l is intended t o be the total price of all automobiles for the given model, color. date. and dealer, while c n t is the total number of automo- biles in that category. Xotice that in this data cube. individual cars are not identified: they only affect the value and count for their category. Son-. let us consider the relation c C ~ ~ ( S a 1 e s. I. hypothetical tuple that - ) n-ould be in both S a l e s and ti lo sales). is Figure 20.17: The cube operator augments a data cube with a border of aggre- gations in all combinations of ctinien.ions ('Gobi', 'red', '2001-05-21', ' F r i e n d l y F r e d ' , 45000, 2) A tuple of the table CLBE;(F)that has * in one or more dimensions TI-ill The interpretation is that on May 21; 2001. dealer Friendly Fled sold two red have for each dependent attribute the sum (or another aggregate f ~ ~ n c t i o n ) of Gobis for a total of $45.000. The tuple the values of that attribute in all the tuples that xve can obtain by replacing ('Gobi', *, '2001-05-21', ' F r i e n d l y F r e d ' , 152000, 7) the *'s by real values. In effect. we build into the data the result of aggregating along any set of dimensions. Sotice. holvever. that the C U B E operator does says that on SIay 21, 2001. Friendly Fred sold seven Gobis of all colors, for not support
  2. tells us that total sales of all Aardvark lnodels in all colors, over all time at all all-or-nothing choices of grouping by day or aggregating over all time. For dealers is 198.000 cars for a total price of $3,521,727,000. another esanlple based on our running automobile database, Ive could choose to aggregate dealers completely or not aggregate them a t all. Hon-ever, we could Consider how to answer a query in \\-hich we specify conditions on certain also choose to aggregate by city, by state, or perhaps by other regions, larger attributes of the S a l e s relation and group by some other attributes, n-hile or smaller. Thus: there are a t least sis choices of grouping for time and at least asking for the sum, count, or average price. In the relation are r sales), we four for dealers. look for those tuples t with the fo1lov;ing properties: l\Tllen the number of choices for grouping along each dimension grows, it becomes increasingly expensive to store the results of aggregating by every 1. If the query specifies a value v for attribute a ; then tuple t has v in its possible conlbination of groupings. S o t only are there too many of them, but component for a. they are not as easily organized as the structure of Fig. 20.17 suggests for tlle 2. If the query groups by an attribute a , then t has any non-* value in its all-or-nothing case. Thus, commercial data-cube systems may help the user t o conlponent for a. choose some n~aterializedviews of the data cube. A materialized view is the result of some query, which we choose t o store in the database, rather than 3. If the query neither groups by attribute a nor specifies a value for a. then reconstructing (parts of) it as needed in response t o queries. For the data cube, t has * in its component for a. the vie~vswe n-ould choose to materialize xi11 typically be aggregations of the full data cube. Each tuple t has tlie sum and count for one of the desired groups. If n-e \%-ant The coarser the partition implied by the grouping, the less space the mate- the average price, a division is performed on the sum and count conlponents of rialized view takes. On the other hand, if ire ~vantto use a view to answer a each tuple t. certain query, then the view must not partition any dimension more coarsely E x a m p l e 20.18 : The query than the query does. Thus, to maximize the utility of materialized views, we generally n-ant some large \-iers that group dimensions into a fairly fine parti- SELECT c o l o r , AVG(price) tion. In addition, the choice of vien-s to materialize is heavily influenced by the F O Sales RM kinds of queries that the analysts are likely to ask. .in example will suggest tlie W E E model = 'Gobi' HR tradeoffs in\-011-ed. GROUP BY c o l o r ; INSERT INTO SalesVl is ansn-ered by looking for all tuples of sales) ~ v i t h form the SELECT model, c o l o r , month, c i t y , ('Gobi', C. *, *, 21, n) SUM(va1) A v a l , SUM(cnt) A c n t S S F O S a l e s JOIN Dealers ON d e a l e r = name RM here c is any specific color. In this tuple, v will be the sum of sales of Gobis GROUP BY model, c o l o r , month, c i t y ; in that color, while n will be the nlini!)cr of sales of Gobis in that color. Tlie average price. although not a n attribute of S a l e s or sales) directly. is v / n . Tlie answer to the query is the set of ( c ,vln) pairs obtained fi-om all Figure 20.18: The materialized vien. SalesVl ('Gobi'. c , *, *. v. n ) tuples. Example 20.19 : Let us return to the data cube 20.5.2 Cube ImplementaOion by Materialized Views 11% suggested in Fig. 20.17 that adding aggregations to the cube doesn't cost S a l e s (model, c o l o r , d a t e , d e a l e r , v a l , c n t ) much in tcrms of space. and saves a lot in time \vhen the common kincis of decision-support queries are asked. Ho~vever:our analysis is based on the as- that n e de\-eloped in Esample 20.17. One possible materialized vie\\- groups sumption that queries choose either t o aggregate completely in a dimension dates by nionth and dealers by city. This view. 1%-hich call SalesV1, is 1%-e or not to aggregate a t all. For some dime~isions.there are many degrees of constlucted by the query in Fig. 20.18. This query is not strict SQL. since n-e granularity that could be chosen for a grouping on that dimension. imagine that dates and their grouping units such as months are understood U c have already mentioned thc case of time. xvl-herenumerolls options such by the data-cube system n-ithout being told to join S a l e s with the imaginary as aggregation by weeks, months: quarters, or ycars exist,, in addition to the relation rep~esenting a j s that \ve discussed in Example 20.14. d Please purchase PDF Split-Merge on to remove this watermark.
  3. CHAPTER 20. IiYFORI\IATIOAr IArTEGR.4TION 20.5. DdT.4 CUBES 1055 aggregation of cities into states, probably by accessing the dimension table for INSERT INTO SalesV2 dealers. SELECT model, week, s t a t e , \Ye cannot answer Q2 from SalesV2. Although we could roll-up cities into SUM(va1) A v a l , SUM(cnt) A c n t S S states (i.e.. aggregate the cities into their states) to use SalesV1, we carrrlot F O S a l e s JOIN D e a l e r s ON d e a l e r = name RM roll-up ~veeksinto years, since years are not evenly divided into weeks. and GROUP BY model, week, s t a t e ; data from a week beginning. say, Dec. 29, 2001. contributes to years 2001 and 2002 in a way we carinot tell from the data aggregated by weeks. Figure 20.19: Another materialized view, SalesV2 Finally, a query like 43: SELECT model, c o l o r , d a t e , ~ ~ ~ ( v a l ) Another possible materialized view aggregates colors completely, aggregates F O Sales RM time into u-eeks, and dealers by states. This view, SalesV2, is defined by the GROUP B model, c o l o r , d a t e ; Y query in Fig. 20.19. Either view S a l e s V l or SalesV2 can be used to ansn-er a query that partitions no more finely than either in any dimension. Thus, the can b e anslvered from neither SalesVl nor SalesV2. It cannot be answered query from S a l e s v l because its partition of days by ~nonths too coarse to recover is sales by day, and it cannot be ans~vered from SalesV2 because that view does 41: SELECT model, SUM(va1) not group by color. We would have to answer this query directly from the full FO Sales RM data cube. G O P BY model; RU can be answered either by 20.5.3 The Lattice of Views To formalize the cbservations of Example 20.10. it he!ps to think of a lattice of SELECT model, SUM(va1) possibl~ groupings for each dimension of the cube. The points of the lattice are F O SalesVl RM the ways that we can partition the ~ a l u c of a dimension by grouping according s G O P BY model; RU t o one or more attributes of its dimension table. nB say that partition PI is belo~v partition P2.written PI 5 P2 if and only if each group of Pl is contained within some group of PZ. SELECT model, SUM(va1) All F O SalesV2 RM G O P B model; RU Y On the other hand, the query / Years 1 42: SELECT model, y e a r , s t a t e , SUM(va1) F O S a l e s JOIN D e a l e r s ON d e a l e r = name RM G O P BY model, y e a r , s t a t e ; RU I Weeks Quarters I Months can o n 1 be ans\vered from SalesV1. as Days SELECT model, y e a r , s t a t e , SUM(va1) F O SalesVl RM Figure 20.20: A lattice of partitions for time inter\-als G O P BY model, y e a r , s t a t e ; RU Incidentally. the query inmediately above. like the qu'rics that nggregate time Example 20.20: For the lattice of time partitions n-e might choose the dia- units, is not strict SQL. That is. s t a t e is not ari attribute of SalesVl: only gram of Fig. 20.20. -4 path from some node f i dotvn to PI means that PI 5 4. c i t y is. \Ye rmust assume that the data-cube systenl knol\-s how to perform the These are not the only possible units of time, but they \\-ill serve as an example Please purchase PDF Split-Merge on to remove this watermark.
  4. of what units a s ~ s t e r n might support. Sotice that daks lie below both \reeks and months, but weeks do not lie below months. The reason is that while a group of events that took place in one day surely took place within one \reek and within one month. it is not true that a group of events taking place in one week necessarily took place in any one month. Similarly, a week's group need not be contained within the group cor~esponding one quarter or to one year. to Sales At tlie top is a partition we call "all," meaning that events are grouped into a single group; i.e.. we niake no distinctions among diffeient times. Figure 20.22: The lattice of views and queries from Example 20.19 All I SalesV2; of course it could also be answered froni the full data cube S a l e s , but there is no reason to want to do so if one of the other views is materialized. Q2 State can be answered from either SalesVl or S a l e s , while Q 3 can only be answered I from S a l e s . Each of these relationships is expressed in Fig. 20.22 by the paths City downxard from the queries to their supporting vie~vs. I Placing queries in the lattice of views helps design data-cube databases. Dealer Some recently developed design tools for data-cube systems start with a set of Figure 20.21: A lattice of partitions for automobile dealers queries that they regard as ..typical" of the application a t hand. They then select a set of views to materialize so that each of these queries is above a t least Figure 20.21 shows another lattice, this time for the dealer dimension of our one of the riel\-s, preferably identical to it or very close (i.e., the query and the automobiles example. This lattice is siniplcr: it shows that partitioning sales by view use the same grouping in most of the dimensions). dealer gives a finer partition than partitioning by the city of the dealer. i
  5. 1088 CHAPTER 20. ISFORJlATIOS IXTEGRA4TIOS 20.6. DATA -111-YIA-G 1089 *! Exercise 20.5.3: In Exercise 20.4.1 lve spoke of PC-order data organized as ! Exercise 20.5.9: In Exercise 20.5.3 n e designed a cube suitable for use ~ v i t h a cube. If we are to apply the CCBE operator, we might find it convenient to the CCBE operator. Horn-ever. some of the dimensions could also b e given a non- break several dimensions more finely. For example, instead of one processor trivial lattice structure. In particular, the processor type could b e organized by dimension, we might have one dimension for the type (e.g., AlID Duron or manufacturer (e g., SUT, Intel. .AND. llotorola). series (e.g.. SUN Ult~aSparc. Pentium-IV), and another d~mension the speed. Suggest a set of dimrnsions for Intel Pentium or Celeron. AlID rlthlon, or llotorola G-series), and model (e.g., and dependent attributes that will allow us to obtain answers to a variety of Pentiuni-I\- or G4). useful aggregation queries. In particular, what role does the customer play? .Also, the price in Exercise 20.4.1 referred to the price of one macll~ne,while a) Design tlie lattice of processor types following the examples described several identical machines could be ordered in a single tuple. What should the above. dependent attribute(s) be? b) Define a view that groups processors by series, hard disks by type, and Exercise 20.5.4 : What tuples of the cube from Exercise 20.5.3 would you use removable disks by speed, aggregating everything else. to answer the following queries? c) Define a view that groups processors by manufacturer, hard disks by a) Find, for each processor speed, the total number of computers ordered in speed. and aggregates everything else except memory size. each month of the year 2002. d) Give esamples of qneries that can be ansn-ered from the view of (11) only, the vieiv of (c) only, both, and neither. b) List for each type of hard disk (e.g., SCSI or IDE) and eacli processor type the number of computers ordered. *!! Exercise 20.5.10: If the fact table F to n-hicli n-e apply the C u B E operator is c) Find the average price of computers with 1500 megahertz processors for sparse (i.e.. there are inany fen-er tuples in F than the product of the number each month from Jan., 2001. of possihle values along each dimension), then tlie ratio of the sizes of CCBE(F) and F can be very large. Hon large can it be? ! Exercise 20.5.5 : The computers described in the cube of Exercise 20.5.3 do not include monitors. IVhat dimensions would you suggest to represent moni- tors? You may assume that the price of the monitor is included in the price of 20.6 Data Mining the computer. A family of database applications cal!ed data rnin,ing or knowledge discovery in dntnbases has captured considerable interest because of opportunities to learn Exercise 20.5.6 : Suppose that a cube has 10 dimensions. and eacli dimension surprising facts fro111esisting databases. Data-mining queries can be thought has 5 options for granularity of aggregation. including "no aggregation" and of as an estended form of decision-support querx, although the distinction is in- "aggregate fully.'' How many different views can we construct by clioosing a formal (see the box on -Data-llining Queries and Decision-Support Queries"). granularity in each dinlension? Data nli11i11:. stresses both the cpcry-optimization and data-management com- Exercise 20.5.7 : Show how t o add the following time units to the lattice of ponents of a traditional database system, as 1%-ell suggesting some important as Fig. 20.20: hours, minutes, seconds, fortnights (two-week periods). decades. estensions to database languages, such as language primitix-es that support effi- and centuries. cient sampling of data. In this section, we shall esamine the principal directions data-mining applications have taken. Me then focus on tlie problem called "fre- Exercise 20.5.8: How 15-onld you change the dealer lattice of Fig. 20.21 to quc'iit iteinsets." n-hich has 1-eceiwd the most attention from the database point include -regions." ~ f : of view. a ) A region is a set of states. 20.6.1 Data-Iblining Applications * b) Regions are not com~liensuratewith states. but each city is in only one Broadly. data-mining queries ask for a useful summary of data, often ~vithout region. suggcstir~gthe values of para~netcrsthat would best yield such a summary. This family of problems thus requires rethinking the n a y database systems are c) Regions are like area codes: each region is contained \vithin a state. some to be used to provide snch insights a b o ~ i tthe data. Below are some of tlie cities are in two or more regions. and some regions h a ~ several cities. e applications and problems that are being addressed using very large amounts Please purchase PDF Split-Merge on to remove this watermark.
  6. Please purchase PDF Split-Merge on to remove this watermark.
  7. 1092 CHAPTER 20. I;YFORhlATION INTEGR.4TION (stop words) such as .'and" or 'The." which tend to be present in all docu- Baskets (basket, item) ments and tell us nothing about the content A document is placed in this space according t o t h e fraction of its word occurrences that are any particular where the first attribute is a .'basket ID," o r unique identifier for a market word. For instance, if the document has 1000 word occurrences, two of which basket, and the secoild attribute is the ID of some item found in that basket. are "database." then the doculllent ~vould placed a t the ,002 coordinate in be S o t e that it is not essential for the relation t o come from true ma~ket-basket the dimension cor~espondingto "database." By clustering documents in this data; it could be any relation from which we x a n t t o find associated items. For space, we tend t o get groups of documents that talk about the same thing. ~nstance,the '.baskets" could be documents and the "items" could be words, For instance, documents that talk about databases might have occurrences of in which case n e are really looking for words that appear in many documents words like "data," "query," "lock," and so on, while documents about baseball together. are unlikely to have occurrences of these rvords. The simplest form of market-basket analysis searches for sets of items that The data-mining problem here is to take the data and select the "means" frequently appear together in market baskets. The support for a set of items is or centers of the clusters. Often the number of clusters is given in advance. the number of baskets in which all those items appear. The problem of finding although that number niay be selectable by the data-mining process as ti-ell. frequent sets of ~ t e m s to find, given a support threshold s , all those sets of is Either way, a naive algorithm for choosing the centers so that the average items that have support a t least s. distance from a point to its nearest center is minimized involves many queries; If the number of items in the database is large, then even if we restrict our each of which does a complex aggregation. attention to small sets, say pairs of items only, the time needed to count the support for all pairs of items is enormous. Thus, the straightforward way to solve even the frequent pairs problem - compute the support for each pair of 20.6.2 Finding Frequent Sets of Items items z and j, as suggested by the SQL query in Fig. 20.24 - ~villnot work Now. we shall see a data-mining problem for which algorithms using secondary This query involves joining Baskets r ~ i t hitself, grouping the resulting tuples storage effectively have been developed. The problem is most easily described by the tri-o l t e ~ n s found 111 that tuple, and throwing anay groups where the in terms of its principal application: the analysis of market-basket data. Stores number of baskets is belon- the support threshold s Sote that the condition today often hold in a data warehouse a record of what customers have bought I.item < J. item in the WHERE-clause is there to prevent the same pair from together. That is, a customer approaches the checkout with a .'market basket" being considered in both orders. or for a .'pair" consisting of the same item full of the items he or she has selected. The cash register records all of these twice from being considered at all. items as part of a single transaction. Thus, even if lve don't know anything about the customer, and we can't tell if the customer returns and buys addi- SELECT I.itern, J.item, COUNT(I.basket) tional items. we do know certain items that a single customer b u - s together. FROM Baskets I, Baskets J If items appear together in market baskets more often than ~vouldbe es- WHERE 1.basket = J.basket AND pected, then the store has an opportunity to learn something about how cus- I.item < J.item tomers are likely to traverse the store. The items can be placed in the store so GROUP BY I.item, J.item that customers will tend to take certain paths through the store, and attractive HAVING COUNT(I.basket) >= s; items can be placed along these paths. E x a m p l e 20.22 : .A famous example. which has been clainied by several peo- Figure 20.24: Saive way to find all high-support pairs of items ple; is the discovery that people rvho buy diapcrs are unusually likely also to buy beer. Theories have been advanced foi n.hy that relationship is true. in- cluding tile possibility that peoplc n-110 buy diapers. having a baby at home. ale 20.6.3 T h e A-Priori Algorithm less likely to go out to a bar in the evening and therefore tcnd to drink beer at home. Stores may use the fact that inany customers 15-illwalk through the store There is an optimization that greatly reduccs the running time of a qutry like from where the diapers are to where the beer is. or vice versa. Clever maiketers Fig. 20.21 \\-hen the support threshold is sufficiently large that few pairs meet place beer and diapers near each other, rvitli potato chips in the middle. The it. It is ieaso~iableto set the threshold high, because a list of thousands or claim is that sales of all three items then increase. millions of pairs would not be very useful anyxay; ri-e xi-ant the data-mining query to focus our attention on a sn~allnumber of the best candidates. The We can represent market-basket data by a fact table: a-przorz algorithm is based on the folloiving observation: Please purchase PDF Split-Merge on to remove this watermark.
  8. 1094 C H A P T E R 20. IATFORlI~4TION INTEGR.ATION INSERT INTO Candidates Association Rules SELECT * FROM Baskets A more complex type of market-basket mining searches for associatzon WHERE item IN ( ~ x l e sof the form {il, 22, . . . , i n ) 3 j. TKO possible properties that \ve SELECT item might want in useful rules of this form are: FROM Baskets 1. Confidence: the probability of finding item j in a basket that has GROUP BY item all of {il,i2.. . . ,i n ) is above a certain threshold. e.g., 50%; e.g.. "at HAVING COUNT(*) >= s least 50% of the people who buy diapers buy beer." >; 2. Interest: the probability of finding item j in a basket that has all SELECT I.item, J.item, ~ ~ ~ N ~ ( ~ . b a s k e t ) of {il, i2,. . . , i n } is significantly higher or lower than the probability FROM Candidates I, Candidates J of finding j in a random basket. In statistical terms, j correlates WHERE 1.basket = J.basket AND with {il, i z ,. . . ,i,,), either positively or negatively. The discovery in I.item < J.item Example 20.22 was really that the rule {diapers) + beer has high GROUP BY I.item, J.item interest. HAVING COUNT(*) >= s; Sote that el-en if a n association rule has high confidence or interest. it n-ill tend not to be useful unless the set of items inrrolved has high support. Figure 20.25: Tlie a-priori algorithm first finds frequent items before finding The reason is that if the support is low, then the number of instances of frequent pairs the rule is not large, which limits the benefit of a strategy that exploits the rule. E x a m p l e 20.23 : To get a feel for how the a-priori algorithm helps, consider a supermarket that sells 10,000 different items. Suppose that the average market- basket has 20 items in it. Also assume that the database keeps 1,000,000 baskets If a set of items S has support s. then each subset of A must also have ' as data (a small number compared with what would be stored in practice). support a t least s. Then the Baskets relation has 20,000,000 tuples, and the join in Fig. 20.24 (the naive algorithm) has 190,000,000 pairs. This figure represents one million In particular, if a pair of items. say {i. j ) appears in, say, 1000 baskets. then baskets times (y) which is 190: pairs of items. These 190,000,000 tuples must we know there are a t least 1000 baskets with item i and we know there are at all be grouped and counted. least 1000 baskets xvith item j. However, suppose that s is 10,000, i.e., 1% of the baskets. It is impossi- The converse of the above rule is that if we are looking for pairs of items ble that Inore than 20.000,000/10,000 = 2000 items appear in a t least 10,000 ~vith support at least s. we may first eliminate from consideration any item that baskets. because there are only 20,000.000 tuples in Baskets, and any item ap- does not by itself appear in a t least s baskets. The a-priorz algorltl~m ans11-ers pearing in 10.000 baskets appears in at least 10,000 of those tuples. Thus: if we the same query as Fig. 20.24 by: use the a-priori algoritllrn of Fig. 20.25, the subquery that finds the candidate ite~ns cannot produce more than 2000 items. and I\-ill probably produce many 1. First finding the srt of candidate nte~ns those that appear in a sufficient - fewer than 2000. number of baskets by thexnsel~es and then - \\'e cannot he sure ho~v large Candidates is. since in the norst case all the items that appear in Baskets will appear in at least 1%of them. Honever. in 2. Running the query of Fig. 20.24 on only the candidate items. practice Candidates will be considerably smaller than Baskets. if the threshold s is high. For sake of argument, suppose Candidates has on the average 10 The a-priori algorithnl is thus summarized by the sequence of two SQL queries itelns per basket: i.e., it is half the size of Baskets. Then the join of Candidates in Fig. 20.25. It first computes Candidates. the subset of the Baskets relation with itself in step (2) has 1,000,000 times (y) = 45,000,000 tuples, less than i~hose iter~ls ha\-c high support by theniselves. then joins Candidates ~vith itself. 11-1 of the number of tuples in the join of Baskets ~ - i t h itself. \Ye ~vould as in the naive algorithm of Fig. 20.24. thtis expect the a-priori algorithm to run in about 111 the time of the naive Please purchase PDF Split-Merge on to remove this watermark.
  9. 1096 C H A P T E R 20. IlYFORM-rlTI0.V INTEGRATION 20.7. SC'AIAI,4RY OF C H A P T E R 20 1097 algorithm. In common situations, where C a n d i d a t e s has much less than half Exercise 20.6.3: Using the baskets of Exercise 20.6.1, answer the following: tlie tuples of B a s k e t s , the improvement is even greater, since running time shrinks quadratically with the reduction in the number of tuples involved in a) If the support threshold is 35%, what is the set of candidate triples? the join. b) If the support threshold is 35%, what sets of triples are frequent? 20.6.4 Exercises for Section 20.6 Exercise 20.6.1: Suppose we are given the eight "market baskets" of Fig. 20.7 Summary of Chapter 20 20.26. + Integration of Information: Frequently, there exist a variety of databases or other information sources that contain related information. nTe have B1= {milk, coke, beer) the opportunity to combine these sources into one. Ho~vever,hetero- BP= {milk, pepsi, juice) geneities in the schemas often exist; these incompatibilities include dif- B3 = {milk, beer) fering types, codes or conventions for values, interpretations of concepts, B4 = {coke, juice) and different sets of concepts represented in different schernas. B = {milk, pepsi, beer) g B6 = {milk, beer, juice, pepsi) + Approaches to Information Integration: Early approaches involved "fed- B7 = {coke, beer, juice) eration," where each database would query t h e others in the terms under- B8 = {beer, pepsi) stood by the second. Nore recent approaches involve ~varehousing, where data is translated to a global schema and copied to the warehouse. An alternative is mediation, where a virtual warehouse is created to allolv Figure 20.26: Example market-basket data queries to a global schema; the queries are then translated to the terms of the data sources. * a) As a percentage of the baskets, what is the support of the set {beer, juice)? + Extractors and Wrappers: Warehousing and mediation require compo- b) What is the support of the set {coke, pepsi)? nents a t each source, called extractors and wrappers, respectively. X ma- jor function is to translate querics and results betneen the global schema * c) What is t h e confidence of milk given beer (i.e., of the association rule and the local schema a t the source. {beer) + milk)? + Wrapper Generators: One approach to designing wrappers is t o use tem- d) What is the confidence of juice given milk? plates, which describe how a query of a specific form is translated from the global schema to the local schema. These templates are tabulated and in- e) What is the confidence of coke, given beer and juice? terpreted by a driver that tries to match queries t o templates. The driver * f) If the support threshold is 35% (i.e., 3 out of the eight baskets are needed), may also have the ability to combine templates in various ways, and/or which pairs of items are frequent? perform additional ~vork such as filtering. to answer more con~plex queries. g) If the support threshold is 50%, which pairs of items are frequent? + Capability-Based Optimtzation: The sources for a mediator often are able or ~villingto answer only limited forms of queries. Thus. the mediator ! Exercise 20.6.2 : The a-priori algorithm also may be used to find frequent sets must select a query plan based on the capabilities of its sources, before it of more than ttvo items. Recall that a set S of k items cannot have support at can el-en think about optiniizing the cost of query plans as con\-entional least s t~nlessevery proper subset of S has support at least s. In particular. DBAIS's do. the subsets of X that are of size k - 1 must all have support a t least s. Thus. having found the frequent itemsets (those with support at least s) of size k - 1. + OLAP: An important application of data I
  10. 1098 C H A P T E R 20. IXFORJIIATION IhTTEGR.4TI0.\' 20.8. REFERENCES FOR CH-APTER 20 1099 + ROLAP and AIOLAP: It is frequently useful when building a warehouse [4] is a survey of data-mining techniques, and [13] is a n on-line survey of for OLAP, to think of the data as residing in a multidimensional space. data mining. The a-priori algorithm was del-eloped in [I] and 121. with diniensions corresponding t o independent aspects of the data repre- sented. Systems that support such a vie~v data take either a relational of 1. R. Agranal, T. Imielinski, and A. Sn-ami: '.lIining association rules be- point of view (ROLAP, or relational OLAP systems), or use the special- tween sets of items in large databases," Proc. -ACAi SIGAlOD Intl. Conf. ized data-cube model (lIOL.AP, or multidimensional OLAP systems). on ibfanagement of Data (1993), pp. 203-216. + Star Schernas: In a star schema, each data element (e.g., a sale of an item) 2. R. Agrawal, and R. Srikant, "Fast algorithms for mining association rules," Proc. Intl. Conf. on V e q Large (1994), pp. 487-199. is represented in one relation, called tlie fact table, while inforniation helping to interpret the values along each dimension (e.g.. what kind of 3. S. Chaudhuri and U.Dayal, .'Ail overview of data warehousing and OLAP product is iten1 1234?) is stored in a diinension table for each diinension. technology," SIGAJOD Record 26: 1 (1997), pp. 63-74. + The Cube Operator: A specialized operator called CCBE pre-aggregates 4. U. 52. Fayyad, G. Piatetsky-Shapiro. P. Smyth, and R. Uthurusamy, Ad- the fact table along all subsets of dimensions. It may add little to the space Lances i n Knowledge Discovery and Data hlznzng. AAAI Press, hlenlo needed by the fact table, and greatly increases the speed with which many Park CA, 1996. OLAP queries can be answered. 3. H. Garcia-llolina, Y. Papakonstalltinou. D. Quass. -1. Rajalaman, Y. Sa- + Dzmenszon Lattices and Alaterialized Vzews: A more polverful approach giv. V. \Bssalos. J. D. Ullman, and J . n7idorn) The TSIlIlIIS approach than the CLBE operator, used by some data-cube implementations. is to to mediation: data nlodels and languages. J. Intellzgent Informatzon Sys- establish a lattice of granularities for aggregation along each dimension tems 8:2 (1997), pp. 117-132. (e.g., different time units like days, months, and years). The ~vareliouse is then designed by materializing certain v i e w that aggregate in different 6. J. S. Gray, A. Bosworth, A. Layman. and H. Pirahesh, .'Data cube: a \va!.s along the different dimensions, and the rien- with the closest fit is relational aggregation operator generalizing group-by. cross-tab, and sub- used t o answer a given query. totals." Proc. Intl. Conf. on Data Englneerzng (1996). pp. 132-139. + Data Mining: IVareliouses are also used to ask broad questions that in- 7. -1.Gupta and I. S. SIumick. A.laterioltzed Vieccs: Technzques, Implemcn- tatzons, and Applzcatzons. l I I T Pres4. Cambridge 11-1. 1999 volve not only aggregating on command. as in OL.1P queries, but search- ing for the "right" aggregation. Common types of data mining include 8. V. Harinarayan, - .1Rajaraman, and J . D. Ullman. ~~Implementiiig data clustering data into similar groups. designing decision trees to predict one cubes efficiently." Proc. ACAf SIGilfOD Intl. Conf. on Management of attribute based on the value of others. and finding sets of items that occur Data (1996). pp. 205-216. together frequently. 9. D. Loniet and J. U-idom (eds.). Special i ~ s u e materialized l-ie~vsand on + The A-Priori Algorithm: -An efficiellt \\-a?; to find frequent itemsets is to data warehouses. IEEE Data Erlg?ilcerlng Builet~n18:2 (1395). use the a-priori algorithm. This technique exploits the fact that if a set occurs frequently. then so do all of its subsets. 10. I*.Papakonstantinou. H. Garcia-llolina. arid J . n'idom. "Object ex- change across heterogeneous information sources." Proc. Intl. Conf. on Data Englneerlng (1993). pp 251-260. 20.8 References for Chapter 20 11. I-.Papakonstantinou. . .Gupta. and L. Haas. "Capnl>ilities-base query I Recent smveys of \varehonsing arid related technologics are in [9]. [3]. and [ T I . ren-riting in mediator s!-stems." Conference 011 Par(111el and Distributed Federated systems are surveyed 111 11'21. The concept of tlic mediato1 conies Informntion Systc~ns (1996). ,\l-;lil~il~le as: from [14]. Implementation of mediators and \\-rappers, especially tlie mapper-genera- tor approach. is covered in [5]. Capabilities-based optilnization for iriediators n-as explored in [ll. 131. 12. .A. P. Sheth and J . -1. Larson. "Federated databases for managing dis- The cube operator was proposed in 161. The i~iipleinentationof cubes by tributed. heterogeneous. and autonomous databases." Cornputzng Surreys materialized vie\\-s appeared in 181. 22:3 (1990), pp. 183-236. Please purchase PDF Split-Merge on to remove this watermark.
  11. 14. G. \Viederhold: "Mediators in the architecture of future information sys- terns." IEEE Computer C-25:l (1992), pp. 38-49. Index 15. R. Yerneni, C. Li, H. Garcia-3Iolina, and J. D. Ullman, "Computing capa- bilities of mediators," Proc. ACM SIGMOD Intl. Conf. on Management of Data (1999), pp. 443-454. Alias See AS Abiteboul, S. 21, 187, 1099 ALL 266, 278,437 .Abort 885, 970, 1017, 1026 ALTER TABLE294.334-335 See also Rollback Xnomaiy ABSOLUTE361 See Deletion anomaly, Redun- --ichilles, X.-C. 21 dancy, Update anomaly ACID properties 14 -Anonymous variable 466 See also Atomicity, Consistency, -ASS1 239 Durability, Isolation Antisemijoin 213 .ACR schedule ANY 266 See Cascading rollback .Application server 7 .Action 340 .A-priori algolithm 1093-1096 -ADA 350 Apt, I
  12. INDEX INDEX 575, 791, 794 Block address Celko, J. 314 Combining rule 90-91 See also Dependent attribute, See Database address Cer~tializedlocking 1030 Comer, D. 663 Dimension attribute, Input Block header 576-577 Ceri. S. 60, 348, 712, 1044 Commit 402,885-886,905,996.1023- attribute, Output attribute Body 465 Chamberlin, D. D. 314, 874 1029 Attribute-based check 327-330,339 Boolean 292 Chandra. .A. I 916, Call-level interface Clustered relation 717, 728. 959 Connection 382-383, 393-394. 412 987 See CLI Clustering 1091-1092 Connection record 386-388 Binary large object 595-596 Candidate item 1094 Clustering indes 757-759. 861-862 Consistency 879. 933, 941, 947 Binary relationship 25, 27-28, 32- Capabilities specification 1066 Cobol 350 Constraint 47-54.231-236,315-340, 33, 56 Capability-based plan selection 1064- Cochrane, R. J 348 376.876.879-880 Binding columns 390-392 1070 CODAS1L 4 See also Dependency Binding parameters 392-393 Cartesian product Codd. E. F. 4. 129-130. 237. 502. Constructor funct~on 447 Bit 572 See Product -? rb;, - Containment of value sets 827 Bit string 246. 292 Cascade policy 321-322 Code CONTINUE 375 Bitmap indes 666. 702-710 Cascading rollback 992-904 See Eiroi-colrecting code Coordinator 102-4. 1031 Blair, R. H. 502 Case insensitivity 181, 244 Collation 382 Correctness principle 879-880. 918 Blasgen, 11. W. 785, 916 Catalog 379-381 Collection 570 Correlated subquery 268-270, 814- BLOB Cattell, R. G. G. 188, 424, 462, 604 Collection type 133. 145, 444 817 See Binary large object C/C++ 133,350,385-386.443.570 See also Array. Bag, Dictionary. Cost-based enumeration 821 Block 509 CD-ROM List. Set See also Join ordering See also Disk block See Optical disk Coxnbmer 1052-1053 Cost-based plan selection 835-847. Please purchase PDF Split-Merge on to remove this watermark.
  13. 1069 Database schema Dercferencing 455-456 See also Floppy disk Count 223, 279-280,437 See Relational database schema DESC 251 DISTINCT 277, 279. 429-430 Crash Database state Description record 386 Distributed database 1018-1035 See Media failure See State, of a database Drsigri 15-16, 39-47, 70-71, 135 Distributive law 218. 221, 797 CREATE ASSERTION337 Data-definition language 10. 292 See also Xorrrlalization DlIL CREATE INDEX296-297,318-319 See also ODL, Schema DeWitt, D. J. 785 See Data-manipulation language CREATE M T O 451 EH D Datalog 463-502 Diaz, 0 . 348 Document retrieval 626-630 CREATE ORDRERING459 Data-manipulation language Dicing 1076-1078 Document type definition CREATE SCHEMA 380-381 See Query language Dictionary 144, 161 See DTD CREATE TABLE293-294,316 DATE 247, 293, 571-572 Difference 192-194; 205, 213-216: Dorilai~l63, 382 CREATE TRIGGER341 Date, C. J. 314 260-261,278-279,442,472, Domain constraint 47, 234 CREATE TYPE450 Dayal, U. 348, 1099 729-730,737,742-743,747, Double-buffering 541-544 CREATE VIEW302 DB2 492 751-752: 755,779, 798,803, DR-Ah1 Creating statements 394 DBMS 833 See Dynamic random-access mem- CROSS JOIN 271 See Database management sys- . See also EXCEPT ory Cross product tem Difference rule 127 Drill-down 1079 See Product DDL Digital versatile disk 0 Driver 393 Cube operator 1079-1082 See Data-definition language See also Optical disk DROP 294, 297, 307-308 CURRENT O 358 F Deadlock 14, 885, 939, 1009-1018. Di~llel~sionattribute 1074 DTD 178, 180-18.5 Cursor 355-361, 370, 396 1033 Dimension table 1073-1075 Dump 910 Cycle 928 Decision tree 1090-1091 Dirty buffer 900 Duplicate elilr~iiiatioi~ 221-222. 225. Cylinder 516,534-536,542-543,579 Decision-support query 1070, 1089- Dirty data 405-407. 970-973, 990- 278,725-727.737-740,747. 1090 992 750-751,755.771,773,779. See also OL.iP DISCONNECT383 803-806.818. 833-834 Dangling tuple 228, 323 DECLARE 352-353,356, 367 Disk 515-525 See also DISTINCT Dar~ven,H. 314 Decoinposition 102-105.107-114.123- See also Floppy disk Duplicate-in~perviousgrouping 807 Data cube 667,673,1047,1072-1073, 124 Disk access 297-300 Durability 2 1079-1089 Default value 295 Disk assembly 515-316 D\-D Data disk 552 Deferred constraint checking 323- Disk block 12. 516. 331, 575-577, See Digital versatile disk Data file 606 325 579: 633.694: 717, 733. 735- Dynamic hashirig 652 Data miriing 9, 1047, 1089-1097 Deletion 288-289,410, 399-600.61.5 - 736, 765. 822. 879, 888 See also Exte~isible a ~ h i n gLin- h . Data source 619,630,642-646.651-632. See also Database address ear llashillg See Source 708 Disk controller 517: 522 D y ~ l a ~ nprogrammillg 815.852-857 ic Data structure 503 See also Modification Disk crash Dynamic random-access lnemory 514 Data type 292 Deletion anomaly See Media failure Dynamic SQL 361-363 See also UDT Delobel. C. 130, 188 Disk failure 546-563 Data ~\-areho~ls'9 Delloigan's laws 331 See also Disk crash See also %rehouse Dense index 607-609,611-612.622 Disk head 516 ELEMENT444 Database 2 636 See also Head assembly 538-541. 544 Elevator algoritl~rli Database address 579-580, 582 Dependency Disk I/O 511.519-523.525-526,717, ELSE 368 Database administrator 10 See Constraint, Functional de- 840: 832. 8.56 ELSIF 368 Database element 879, 957 pendency, llultivalued de- Disk scheduling 538 Embedded SQL 349-365. 384 Database management system 1. 9- pendency Disk striping E D 368 N 10 Dependency graph 494 Sce RAID. Striping End-checkpoint action 893 Database programming 1, 15. 17 Dependent attribute 1073 Diskette 519 End-dump action 912 Please purchase PDF Split-Merge on to remove this watermark.
  14. Entity set 24-25,40-44,66-67,155 Faloutsos, C. 663,712 See also Constructor function Gupta, A. 237,785,1099 See also Weak entity set Faulstich, L.C. 188 Functional dependency 82-117,125, Guttman, A. 712 Entitylrelationship model Fayyad, U.1099 231,233 H See E / R model FD Enumeration 137-138,572 See Functional dependency G Haderle, D. J. 916,1044 Environment 379-380 Federated databases 1047,1049-1051 Gaede, iT. 712 Hadzilacos, 1 916,987 ' . Environment record 386-388 FETCH 356,361,389-390 Galliare, H. 502 Haerder, T. 916 Equal-height histogram 837 Field 132.567,570.573 Gap 516 Hall, P.A. V. 874 Equal-width histogram 836 FIFO Garcia-AIolina. H. 188. 566, 1044, Hamilton, G. 424 Equijoin 731,819;826 See First-in-first-out 1099-1100 Hamming code 557,562 Equivalent sets of functional depen- File 504,506,567 Generalized projection Hamming distance 562 dencies 90 See also Sequential file See Grouping Handle 386 E/R diagram 25-26,50,53,57-58 File system 2 Generator method 457-458 Hapner, 11. 424 E/R model 16,23-60,65-82, 173, Filter 844,860-862,868 Generic SQL interface 319-350,354 Harel, D.502 189 Filter, for a wrapper 1060-1061 Geographic information system 666- Harinarayan, V. 237,785,1099 Error-correcting code 557,562 Finkel, R. A. 712 667 Hash function 649-650,652-653,656- Escape character 248 Finkelstein, S. J 502 GetNext 720 657 Eswaran; K. P. 785,987 First normal form 116 Gibson. G. A. 566 See also Partitioned hash func- Event 340 First-come-first-served 956 Global lock 1033 tion Event-condition-action rule First-in-first-out 767-768 Global schema 1051 Hash join 752-753,844,863 See Trigger Fisher, AI. 424 Goodman, N.916,987 See also Hybrid hash join EXCEPT 260,442 Flash memory 514 Gotlieb. L.R.7 . 85 Hash key 649 Exception 142,374-376 Floating-point number Graefe, G 7 . , 8 5 874 Hash table 649-661,665. 749-757. Exclusive lock 940-942 See Real number Grammar 789-791 770.773-774,779 EXEC SQL352 Floppy disk 513 Grant diagram 416-417 See also Dynamic hashing EXECUTE 362,392,410-411 See also Diskette Grant option 415-416 HAVING 277.282-284.441 Executing queries/updates, in JDBC Flush log 886 Grarlti~igprivileges 414-416 Head 463 394-393 FOR 372-375 Granularity, of locks 957-958 Head assembly 515-516 Execution engine 10,15 FOR EACH ROW 341 Graph Head crash EXISTS 266,437 Foreign key 319-322 See Polygraph. Precedence graph, See IIedia failure EXIT 375 Formal data cube 1072 7'iBits-for graph Header Espressioli tree 202,308 See also Data cube Gray. J. S. 424,566.916.987-988, See Block header. Record header Extended projection 222,226-227 Fortran 350 1044,1099 Heap structure 624 Estensible hashing 652-656 Forwarding address 581.599 Greedy algorithm 857-858 Held. G 21 Estensible markup language 4NF 122-125 Grid file 666,676-681,683-684 Hellerstein. J. 11. 21 See SAIL Fragment, of a ieldtion 1020 Griffiths. P.P. 424 Heterogeneous sources 1048 Cstensional predicate 469 Free adornment GROUP BY 277.280-284. 438-441 Heuristic plan selection 843-844 Estent 131-152.170 See Adoriimcnt Group co~nrnit996-997 See also Greedy algorithm Estractor Frequent itemset 1092-1096 Group niode 954-955, 961 Hill climbing 844 See \\lapper Fr~edman.J . H. 712 Groupi~lg 221-226.279.727-728.737. Hinterberger. H. 712 FROM 210,264. 270. 284. 288.428. 740-741.747. 751.755.771. Histogram 836-839 430,789-790 773. 780,806-808.834 Holt. R. C. 1044 Fact table 670,1072-1075,1079 Full outerjoin See also GROUP BY Hopcroft. J. E. 726.852 Fagin, R.129-130, 424,663 See Outerjoin Gulutzan. P. 314 Horizontal deconlposition 1020 Faithfulness 39 Function 365-366,3iG-377 Gunther. 0 . 712 Host language 350-352 Please purchase PDF Split-Merge on to remove this watermark.
  15. Howard, I. H. 129 Inheritance 132, 134-135 See JDBC Lattice, of views 1085-1087 Hsu, 11. 916 See also Isa relationship, Llul- JDBC 349, 393-397 Layman. A 1099 HTXlL 629 tiple inheritance, Subclass Join 112-113,192-193.254-255.270- Leader election 1027 Hull, R. 21 Input action 881, 918 272, j05-506 Leaf 174, 633-634 Hybrid hash join 753-755 Input attribute 802 See also .antisemijoin, CROSS JOIN, Least fixedpoint 481-486,488,499 Insensitive cursor 360 Equijoin, Satural join, Sested- Least-recently used 767-768 Insertion 286-288,410,598-599,615- loop join, Outerjoin, Selec- LEAVE 371 I D 183 620,630,639-642,650-651, tivity, of a join. Semijoin, Left outerjoin 228, 273 Idempotence 230,891, 998 653-660,677-679,691,697- Theta-join, Zig-zag join Left-deep join tree 848-849, 853 Identity 555 698, 708 Join ordcring 818, 8-17-859 Left-recursion 484 IDREF 183 See also Modification Join tree 848 Legacy database 9, 175, 1065 I F 368 Instance, of a relation 64, 66 Joined tuple 198 Legality, of schedules 933-934,941- Imielinski, T. 1099 Instance, of an entity set 27 Juke box 512 942, 947 Immediate constraint checking 323- Instance variable 132 Lewis, P. 11. 11 1013 325 INSTEAD OF344-345 Ley, 11. 21 Immutable object 133 Integer 292-293, 569, 650 Kaiser, G. E. 1044 Li, C. 1100 Impedence mismatch 350-351 Intensional predicate 469 I i a ~ ~ e l l a kP.s188 ~ . LIKE 246-248 I N 266-267,430 Intention lock 959 I
  16. INDEX Logical logging 997-1001 Slegatron 737 (imaginary disk) ,536- ple-key index. Partitioned M ROW/TABLE341-344 E W Logical query plan 714-715. 787- 537 hash function. Quad tree. NEXT 361 788, 817-820, 840-842 IIegatron 747 (imaginary disk) 518- R-tree Xicolas, J.-11. 237 See also Plan selection 519, 521L.522 ~IultidimensiorlalOLAP Siel-ergelt, J. 663, 712 Lomet, D. 604, 1099 Xlegatron 777 (imaginary disk) 52-1 See l I O L + i P Sode 174 Long-duration transaction 1035-1041, hielkanoff, M. -4. 130 ~Iultilel-el indes 610-612 Sonlinear recursion 484, 492 1071 Melton, J. 314, 424 See also B-tree Sonquiescent archiving 910-913 Lookup 609: 613-614,638-639.659- Memory address 582 llultiinedia d a t a 8 Sonquiescent checkpoint 892-895,900- 660,676-677,680,691,707- Slemory hierarchy 507--513 Multipass algorithm 771-774 902, 905-907 708 hfemory size 717. 728, 731 llultiple disks 536-537, 544 Xontri~ial FD Loop 370-371 Merge-sort 527-532 Xlultiple inheritance 150-151 See Trivial FD Lorie, R. -4. 874, 987 See also Two-phase. multi~vay LIultiple-key indes 666, 687-690 Nontrivial MVD Lotus notes 175 merge-sort Multiset See Trivial IIVD Lozano, T. 712 Merging nodes 643-645 See Bag Sonvolatile storage Lletadata 13 Multi-tier architecture 7 See Iblatile storage LRU See Least-recently used Method 133-134,141-143,1.56,167, . llultil-alued dependency 118-127 llultiversion timestamp 975-977 Sormalization 16 Sull character 571 171, 451-452. 569 See also Generator method. Mu- Nult i~vay merge-sort Sull \value 70, 76, 79-80, 228, 248- tator method See Tn-o-phase, multiway merge- 251,283,295,316,318.328, Main memory 508-509,513, 525 Minimum 223, 279, 437 sort 592-394, 1049 Main-memory database systeni 510, hlinker, J. 502 llultiway relationship 28-30,32-33, See also Set-null policy 765 Mirror disk 534, 537-538. 944. 552 148-149 Majorrty locking 1034 Mode, of input or output parame- Jlumick. I. S. 302, 1099 Many-many relationship 28-29. 140- ters 142 36.5-366 JIuliips 350 Objcct 78-79. 133, 13.5. 170. 369 141 Model Mutable object 133 Object broke1 578 Many-one relationship 27, 29. 56, See E / R model, Object-oriented LIutator method 437 Object defiriitiori language 140-141, 154 model. Object-relational model. Mutual recursion 494 See ODL Map table 579-580 Relational model. Semistruc- 111-D Object identifier 569 llarket-basket data 1092 tured data See \Iultivalued dependency Object identity 132-133. 133. 167. See also Association rule l\lodification 297, 321-322. 358-3.59 171 Materialization 859. 863-867 See also Deletion, Insertion. Cp- See also Reference colu~nn Materialized view 1083, 1085-1087 datable vien, Ppdate Saqx i. S. 502 Objcct query language Mattes, S. 348. 502 12ilodule 38-385, 412-413 Satural join 198-199.203. 219.272. See OQL \laximum 223, 279, 437 See also PSZIl 476-177,730-731.737.743- Object-oriented database 765 SIcCarthy, D. R. 348 Modulo-2 sum 747. 7.52-73.5. 760-763. 771- Object-oriented model 132-133.170- IIcCreight, E. 11. 663 See Parity bit 773. 779-780.796.7'98-799. 171. 173 IIcHugli. J. 187 IIohan, C.916, 1044 802.80.5.819.820-532.562- See also Object-relational ~nodcl. IIcJon~s. R. 916 P. 1IOL.IP 1073 8G7 ODL. OQL \Iean tirnc to failu~c321 See also Data rube Savathe. S. B. GO Ob,ject-relational model 8. 16. 131. Media decay .346 Ilonotonicity 497-499 Searest-neighbor query 667-669. 671 166-173.423. 449 461 lIedia failure 546.349,876-877.909- Moore's law 510 672. 681. 683. 690. 693 Observer method 4.56-137 913 Xloto-aka, T. 785 Segated subgoal 465. 467 ODBC Vediator 1048. 1053-1070 11ultidimensional indes 665-666.673- Sested relation 167-169 See CLI Ilegatron 2002 (imaginary DBMS) 67-1 rested-loop join 258. 733-737. 744. ODL 16. 135-166.172. S69 503-507 See also Grid file. kd-tree, SIulti- 769-770. 847. 849-850 ODAIG 187 Please purchase PDF Split-Merge on to remove this watermark.
  17. Offset 572-573 Papadimitriou, C. H. 987. 1044 Platteri515 517 Offset table 580-581, 598 Papakonstantinou, Y. 188. 1099 PL/I 350 Quad tree 666, 695-696 OID Parallel computing 6-7,775-782.983 Pointer swizzling Quantifier See Object identifier Parameter 392, 396-397 See S~vizzling See ALL, A Y EXISTS N, OLrlP 1047.1070-1089 Parity bit 548, 552-553 Polygraph 1004-1008 Quass, D. 187, 237. 712, 785, 1099 See also XIOLAP, ROLAP Parse tree 788-789, 810 Precedence graph 926-930 Query 297. 466, 504-50.3 OLD ROW/TABLE341-344 Parser 713-715. 788-79.5 Precommitted transaction 102.5 See also Decision-support query, Olken, F. 785 Partial-match query 667. 681. 684. Predicate 463-46.2 Lookup, Searest-neighbor OLTP 1070 688-689, 692 Prefetching query, Partial-match query. ON 271 Partition attribute 438 See Double-buffering Range quer3.; Where-am-I On-demand stvizzling 585 Partitioned hash function 666 682- PREPARE 362, 392 query O'Seil, E. 424 684 Prepared statement 394-395 Query compiler 10, 1.2-15, 713-715, O'Seil, P. 424, 712 Pascal 350 787 Preprocessor 793-794 One-one relationship 28-29,140-141 Path expression 426, 428 See also Query optilnization Preservation, of FD's 115-116, 125 One-pass algorithm 722-733, 850, Paton, N. \V. 348 Query execution 713. 870-871 Preservation of value sets 827 862 Pattern 791 Query language 2, 10 On-line analytic processing Patterson, D. A. 566 Price, T. G. 874 See also Datalog, OQL, Rela- See OLIZP PCDATA 180 Primary index 622 tional algebra, SQL On-line transaction processing Pelagatti, G. 1044 See also Dense index, Sparse Query optimization 15, 714-715 See OLTP Pelrer, T. 314 index See also Plan selectioli Open 720 Percentiles Pri~nary key 48, 316-317, 319, 576, Query plan 10, 14 Operand 192 See Equal-height histogram 606 See also Logical quei-y plan, Pliys- Operator 192 Persistence 1, 301 Primary-copy locking 1032-1033 ical query plan. Plan selec- Optical disk 512-513 Pe~sistent stored modules PRIOR 361 tion Opti~nisticconcurrency control See P S l I Privilege 410-421 Query processing 17-18. 506 See Timestamp, Validation Peterson, W. \I/. 664 Probe relation 8-47,830 See also Execution engine. Query Optimization Phantom 961-962 Procedure 365. 376-377 compiler See Query optimization Physical address 579, 582 Product 192-193,197-198,218,254- Query processor OQL 423-449, 570 Ph? sical query plan 714-713. 787. 255,176, 730; 737: 796, 798- See Query compiler, Q u e ~ y ex- O D R BY 251-252, 284 RE 821, 842-845. 8.59-872 799. 803, 805,832 ecution Ordering relationship, for LDT 458- Piatetsky-Shapiro, G. 1099 Projection 112-113; 192-193, 195, Query rewriting 714-715. 788. 810- 460 Pinned block 586-587. 768. 993 205,216-217,242,2.25> 473$ 82 1 Outerjoin 222. 228-230, 272-274 Pipelining 859, 863-867 724-725,737,802-805,823, See also Algebraic law Output action 881. 918 See also Iterator 832, 864 Quicksort 527 Output attribute 802 Pippengcr. K. 663 Quotient 213 See also Extended Overflon block 599. 616-617. 619. Pirahesh, H. 318. 502. 916. 1014 Pushing projectiolls 649. 656 1099 Projection. of FD's 98-100 Overloaded ~ncthod142 Plan selection 1022 Ozsu. 11. T. 1043 Prolog 501 R-\ID 531-.363. 876-877 See also Algorithm zelcctloll. Cal)al~llir\ - Pseudotransitivit~- 101 .\. Rajara~nan. 1099 based plan selecrion. Cobt- PS1I 349: 365-378 R.-\lI disk 514 based enumeration. Cost- PUBLIC 410 Ramakrishnan. R. 502 Pad chalacter 570 based plan selection. Heuiis- tic plan selection. Physi- Pushing projections 802-804, 818 Random-access memory 508 Page 509 cal query plan. Top-don-n Pushing selections 797,800-801.818 Range query 638-639.632.667.673. See also Disk block Putzolo, F. .566, 988 681. 689. 692-693 Palermo. F. P. 874 plan selection Please purchase PDF Split-Merge on to remove this watermark.
  18. Raw-data cube 1072 sion table, Fact table, Probe Root 174, 633 Seeger. B. ill See also Data cube, Fact table relation, Table, View Root tag 179 S~ek time 519-520. 535, 540 Read action 881, 918 Relation schema 62,66, 73, 194,292- Rosenkrantz, D. J. 1045 SELECT 240-243,281.410,428,431- R A COMMITTED407-408 ED 301 Rotation, of disk ,517 432, 789-790 Read lock Relational algebra 189-237,259-260, Rotational latency 520, 510 See also Single-row select See Shared lock 463,471-480,795-808,811 See also Latency Selection 192-193. 196, 205, 217- Read set 979 Relational atom 464 Rothnie, J. B. Jr. 712, 987 218, 221, 241, 243, 245- Read time 970 Relational database schema 24, 62, Roussopoulos, S. 712 246,473-175,724-725,737. R A U C M I T D 407-408 E D N O MT E 190-191,379-381,383 Row-level trigger 342 758-760,777-779,797-801, Read-locks-one-write-locks-all 1034 Relational model 4-5, 61-130, 195- R-tree 666, 696-699 803.818,823-526,844,860- Read-only transaction 403-404,958 164, 173 Rule 465-468 862, 864, 868 Real number 293, 569 See also Nested relation, Object- Run-length encoding 704-707 See also Filter, Pushing selec- Record 567,572-577,598-601 relational model tions, TIT-0-argument selec- See also Sliding records, Spanned Relational OLAP S tion record, Tagged field, Vari- See ROLAP Safe rule 467, 482 Selectivity, of a join 858 able-format record, Variable- length record Relationship 25, 31-32, 40-44, 67- 70, 138-141, 162-163 . Saga 1037-1040 Self-describing data 175 Selinger, P. G. 874 Sagiv, Y. 1099 Record address See also Binary relationship, Isa Salem, K. 566, 1044 See also Griffiths. P. P. See Database address relationship, Many-many re- Salton, G. 664 Selinger-style optimization 845, 857 Record fragment 595 lationship, Many-one rela- Schedule 918, 923-924 Sellis, T. I
  19. Shared memory 775-776, 778 Srikant, R. 1099 Syntactic category 788-789 Transaction 1-2, 12, 17-19,397-409, Shared variable 352-354 Stable storage 548-550 Syntax analysis 877-883,923-924,1020-1021 Shared-nothing machine 776-777 Star schema 1073-1075 See Parser See also Incomplete transaction. Shaw, D. E. 785 Start action 884 System failure 876-877 Long-duration transaction Sheth, A. 1099 START TRANSACTION402 System R 21, 314, 874 Transaction component 1020 Signature 141-142 Start-checkpoint action 893 T~ansaction manager 878, 917 Silberschatz, .988I . Start-dump action 911 Transaction processing Silo 512 Starvation 1016-1017 See Conculrency, Deadlock, Lock- Simon, A. R. 314 State, of a database 879. 1039 Table 293, 301, 303 ing, Logging, Scheduling Simple projection 802 Statement record 386-388 See also Relation Transfer time 520, 535 Simplicity 40 Statement-level trigger 342 Table-scan 716. 719, 721, 861-862, Transitive rule 96-97, 121 Single-row select 354, 370 Statistics 13, 836, 839-810 867-868 Translatio~i table 582-583 Single-value constraint 47, 51 See also Histogram Tag 178 Tree See also Functional dependency, Stearns, R. E. 1045 Tagged field 593 See B-tree, Bushy tree Deci- Many-one relationship Stemming 629 Tanaka. H. 785 sion tree, Expression tree, Size estimation 822-834, 836-839 Stern, R. C. 210 Tape 512 Join tree. kd-tree, Left-deep Size, of a relation 717,822,840, 842 Stonebraker, hl. 21, 785, 1045 Template 1058-1059 join tree, Parse tree, Quad Skeen, D. 1045 Stop word 629 Tertiary memory 512-513 tree, Right-deep join tree, Slicing 1076-1078 Storage manager 12, 17-18 Tertiary storage 6 R-tree Sliding records 616 See also Buffer Thalhein~.B. 60 Tree protocol 963-969 Smalltalk 132 Stratified negation 486-490,194-496 T E 368 HN Trigger 315, 336. 340-34.5, 410-411, Smith, J. 11. 874 Strict locking 994 Theta-join 199-201. 205, 220, 477, 876, 879 Smyth, P. 1099 String 245-247,292 731.796-799.802,805,819- Trivia! FD 92. 103 Snodgrass, R. T. 712 See also Bit string 520. 826-827 Trivial lI1.D 120-122. 127 Sort join 743-747, 844, 862-863 Stripe 676 Theta-outerjoin 229 Tuple 62-63, 170 Sort key 526, 606, 636 Striping 596 Third norrnal form See also Dangling tuple Sorted file Strong, H. R. 663 See 3 S F Tuple variable 256-2.57 See Sequential file Struct 132-133, 137-138, 144-145. Thomas. R. H. 1045 Tuple-based check 327,330-331.339 Sorted sublist 529, 738, 770 157,166-167,431, 446.368 Tliomasian. -1. 988 Turing-complete language 189 Sorting 222,227-228,526-532,737- Structured address 380-581 Thrashing 766 TNO-argumentselection 812-817 749,755-756,771-773.845 Sturgis, H. 566, 1044 3 S F 114-116. 124-125 Two-pass algorithm 737-757 See also O D R BY, Ordering re- RE Subclass 33-36, 76-80, 149-151 Three-valued logic 249-2.51 Two-phase commit 1021-1028 lationship, for UDT Subgoal 465 Thuraisingliam. B. 988 Two-phase locking 936-338 Sort-scan 716-717,719,721-722,868 Subquery 264-276, 431-432. 812- TIME 247-248. 293. 371-572 Tv.-0-phase.multin-ay merge-sort 0. Source 1017 819 Timeout 1009-1010 528-532. 336-537 Spanned record 594-595 See also Correlated subquery TIMESTAMP 248. ,575. 577. 969-979. Type 794. 1019 Sparse indes 609-612,622,636 Subrahmanian. 1 S. 712 ' . 954. 1014-101 7 Type constructor 132 Splitting law 797-798 Suciu. D. 187-188, I099 Tonlbstone 581. 600 T\ PC y s t e n ~132-133.144-146. 171 Splitting nodes 640-612, 645. 698- Sum 223. 279. 437 Top-down plan selt,ctiori 843 699 Superkey 86, 105 TPlIlIS Splitting rule 90-91 Support 1093 See TI!-o-phase. niultin-a>-niergc- UDT 449-4.52 SQL 4-5, 131, 189, 239-424, 449- Supporting relationship 56, 72. 74- sort Ullnla~i. D. 21. 130.474, ,502.530. J. 461,192-500, 789-793 75 Track 515-517. 579 726. 789. 8.52. 1099-1100 SQL agent 385 Swami, A. 1099 Traiger. I. L. 957-988 U D R 410-411 NE SQLSTATE352-353, 356. 374 Swizzling 581-586 Training set 1091 UNDO 3'75 Please purchase PDF Split-Merge on to remove this watermark.
  20. ISDES Undo logging 884-896 Vitter, J. S. 566 Undo/redo logging 887; 903-909 Volatile storage 513-514 Union 192-194, 215-217, 260-262, 278,442,472.722-723.728- W 729,741,747; 751-7521 755, 779, 796-798,803,833 Wade, B. \V. 424 Union rule 127 Wait-die 1014-1017 Zaniolo: C. 130. 712 UNIqUE 316-319 \i7aiting bit 955 Zicari, R. 712 UNKNOWN249-251 Waits-for graph 1010-1012 Zig-zag join 762-763 Unknown value 248 Walker, A. 502 Zip disk 513 Unstratified negation Warehouse 1048, 1051-1053, 1071 Zipfian distribution 632, 823 See Stratified negation Warning protocol 958-961 Unswizzling 586 \\leak entity set 54-59, 71-75, 154 Updatable view 305-307 Weiner, 3. L. 187 Update 289-290,410,601,615-616, \17ell-formed XhlL 178-180 709, 1052 Iiestwood, J. N. 210 See also llodification W E 340, 342 HN Update anomaly 103 W E E 240-241, 243-244, 264, 284, HR Update lock 945-946 288, 428-429, 789 Update record 885-886, 897, 903 Where-am-I query 667, 697 Upgrading locks 943-945,957 WHILE 373 See also Update lock White, S. 424 USAGE 410 IVidom, J. 187-188, 348, 1099 User-defined type Wiederhold, G. 604, 1100 See UDT WITH 492-493 Uthurusamy, R. 1099 IITong,E. 21. 874 Wood, D. 785 IVorkflow 1036 World-Wide-Web consortium 187 Valduriez. P. 1045 Wound-wait 1014-1017 Valid SAIL 178-179 Wrapper 1048,1057-1064 i'alidation 969. 979-985 b l u e cou~lt719, 822, 840 Wrapper generator 1059-1060 Write action 881, 918 VALUES 286 K i t e failure 546, 550 \Bn Gelder. .A. 502 VARCHAR292 See also System failure Variable-foi mat iecord 590,593-594 Ifiite lock 1-ariable-length record 570-571, 589 See Exclusive lock 394.998-999 Write sct 979 IBssalos. 1-.1090 m i t e time 970 Iertical decomposition 1020 IVrite-ahead logging rule 897 \7aiiu. 1. 21 . See also Redo logging fiew 301-312, 345. 1053 Write-through cache 508 See also Materialized riel%* Iiew-serializability 1003-1009 17rtual memory 509-5 10. 578 Please purchase PDF Split-Merge on to remove this watermark.




Đồng bộ tài khoản