# High-Performance Parallel Database Processing and Grid Databases- P6

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

0
69
lượt xem
6

## High-Performance Parallel Database Processing and Grid Databases- P6

Mô tả tài liệu
Download Vui lòng tải xuống để xem tài liệu đầy đủ

High-Performance Parallel Database Processing and Grid Databases- P6: Parallel databases are database systems that are implemented on parallel computing platforms. Therefore, high-performance query processing focuses on query processing, including database queries and transactions, that makes use of parallelism techniques applied to an underlying parallel computing platform in order to achieve high performance.

Chủ đề:

Bình luận(0)

Lưu

## Nội dung Text: High-Performance Parallel Database Processing and Grid Databases- P6

1. 230 Chapter 8 Parallel Universal Qualiﬁcation—Collection Join Queries Case 1: ARRAYS a(250, 75) b(210, 123) f(150, 50, 250) Hash Table 1 Hash Table 2 Hash Table 3 50(f) 150(f) 75(a) 210(b) 123(b) 250(f) 250(a) Case 2: SETS a(250, 75) Sort a(75, 250) b(210, 123) b(123, 210) f(150, 50, 250) f(50, 150, 250) Hash Table 1 Hash Table 2 Hash Table 3 50(f) 150(f) 75(a) 210(b) 123(b) 250(f) 250(a) Figure 8.7 Multiple hash tables collision will occur between h(150,50,25) and collection f (150,50,250). Collision will occur, however, if collection h is a list. The element 150(h/ will be hashed to hash table 1 and will collide with 150( f /. Subsequently, the element 150(h/ will go to the next available entry in hash table 1, as a result of the collision. Once the multiple hash tables have been built, the probing process begins. The probing process is basically the central part of collection join processing. The prob- ing function for collection-equi join is called a function universal. It recursively
2. 8.4 Parallel Collection-Equi Join Algorithms 231 checks whether a collection exists in the multiple hash table and the elements belong to the same collection. Since this function acts like a universal quantiﬁer where it checks only whether all elements in a collection exist in another collection, it does not guarantee that the two collections are equal. To check for the equality of two collections, it has to check whether collection of class A (collection in the multiple hash tables) has reached the end of collection. This can be done by check- ing whether the size of the two matched collections is the same. Figure 8.8 shows the algorithm for the parallel sort-hash collection-equi join algorithm. Algorithm: Parallel-Sort-Hash-Collection-Equi-Join // step 1 (disjoint partitioning): Partition the objects of both classes based on their ﬁrst elements (for lists/arrays), or their minimum elements (for sets/bags). // step 2 (local joining): In each processor, for each partition // a. preprocessing (sorting) // sets/bags only For each collection of class A and class B Sort each collection // b. hash For each object of class A Hash the object into multiple hash tables // c. hash and probe For each object of class B Call universal (1, 1) // element 1,hash table 1 If TRUE AND the collection of class A has reached end of collection Put the matching pair into the result Function universal (element i, hash table j): Boolean Hash and Probe element i to hash table j If matched // match the element and the object Increment i and j // check for end of collection of the probing class. If end of collection is reached Return TRUE If hash table j exists // check for the hash table result D universal (i, j) Else Return FALSE Else Return FALSE Return result Figure 8.8 Parallel sort-hash collection-equi join algorithm
4. 8.5 Parallel Collection-Intersect Join Algorithms 233 Algorithm: Parallel-Hash-Collection-Equi-Join // step 1 (data partitioning) Partition the objects of both classes to be joined based on their ﬁrst elements (for lists/arrays), or their smallest elements (for sets/bags) of the join attribute. // step 2 (local joining): In each processor // a. hash Hash each element of the collection. Collision is handled through the use of linked- list within the same hash table entry. Elements within the same collection are linked in a different dimension using a circular linked-list. // b. probe Probe each element of the collection. Once a matched is not found: Discard current collection, and Start another collection. If the element is found Then Tag the matched node If the element found is the last element in the probing collection Then Perform a traversal If a circle is formed Then Put into the query result Else Discard the current collection Start another collection Repeat until all collections are probed. Figure 8.10 Parallel hash collection-equi join algorithm 8.5 PARALLEL COLLECTION-INTERSECT JOIN ALGORITHMS Parallel algorithms for collection-intersect join queries also exist in three forms, like those of collection-equi join. They are: ž Parallel sort-merge nested-loop algorithm, ž Parallel sort-hash algorithm, and ž Parallel hash algorithm
5. 234 Chapter 8 Parallel Universal Qualiﬁcation—Collection Join Queries There are two main differences between parallel algorithms for collection- intersect and those for collection-equi. The ﬁrst difference is that for collection- intersect, the simplest algorithm is a combination of sort-merge and nested-loop, not double-sort-merge. The second difference is that the data partitioning used in parallel collection-intersect join algorithms is non-disjoint data partitioning, not disjoint data partitioning. 8.5.1 Non-Disjoint Data Partitioning Unlike the collection-equi join, for a collection-intersect join, it is not possible to have non-overlap partitions because of the nature of collections, which may be overlapped. Hence, some data needs to be replicated. There are three non-disjoint data partitioning methods available to parallel algorithms for collection-intersect join queries, namely: ž Simple replication, ž Divide and broadcast, and ž Divide and partial broadcast. Simple Replication With a Simple Replication technique, each element in a collection is treated as a single unit and is totally independent of other elements within the same collection. Based on the value of an element in a collection, the object is placed into a partic- ular processor. Depending on the number of elements in a collection, the objects that own the collections may be placed into different processors. When an object has already been placed at a particular processor based on the placement of an element, if another element in the same collection is also to be placed at the same place, no object replication is necessary. Figure 8.11 shows an example of a simple replication technique. The bold printed elements are the elements, which are the basis for the placement of those objects. For example, object a(250, 75) in processor 1 refers to a placement for object a in processor 1 because of the value of element 75 in the collection. And also, object a(250, 75) in processor 3 refers to a copy of object a in processor 3 based on the ﬁrst element (i.e., element 250). It is clear that object a is replicated to processors 1 and 3. On the other hand, object i(80, 70) is not replicated since both elements will place the object at the same place, that is, processor 1. Divide and Broadcast The divide and broadcast partitioning technique basically divides one class into a number of processors equally and broadcasts the other class to all processors. The performance of this partitioning method will be strongly determined by the size of the class that is to be broadcasted, since this class is replicated on all processors.
6. 8.5 Parallel Collection-Intersect Join Algorithms 235 Class A Class B a(250, 75) r(50, 40) Processor 1 d(4, 237) t(50, 60) (range 0-99) f(150,50, 250) u(3, 1, 2) i(80, 70) w(80, 70) b(210, 123) p(123, 210) Processor 2 c(125, 181) s(125, 180) (range 100-199) f(150, 50, 250) v(100, 102, 270) h(190, 189, 170) a(250, 75) p(123, 210) Processor 3 b(210, 123) q(237) (range 200-299) d(4, 237) v(100, 102, 270) e(289, 290) Figure 8.11 Simple replication f(150, 50, 250) technique for parallel g(270) collection-intersect join There are two scenarios for data partitioning using divide and broadcast. The ﬁrst scenario is to divide class A and to broadcast class B, whereas the second scenario is the opposite. With three processors, the result of the ﬁrst scenario is as follows. The division uses a round-robin partitioning method. Processor 1: class A (a; d; g/ and class B ( p; q; r; s; t; u; v; w/ Processor 2: class A (b; e; h/ and class B ( p; q; r; s; t; u; v; w/ Processor 3: class A (c; f; i/ and class B ( p; q; r; s; t; u; v; w/ Each processor is now independent of the others, and a local join operation can then be carried out. The result from processor 1 will be the pair d q. Processor 2 produces the pair b p, and processor 3 produces the pairs of c s; f r; f t, and i w. With the second scenario, the divide and broadcast technique will result in the following data placement. Processor 1: class A (a; b; c; d; e; f; g; h; i/ and class B ( p; s; v/. Processor 2: class A (a; b; c; d; e; f; g; h; i/ and class B (q; t; w/. Processor 3: class A (a; b; c; d; e; f; g; h; i/ and class B (r; u/. The join results produced by each processor are as follows. Processor 1 pro- duces b– p and c–s, processor 2 produces d–q; f –t, and i–w, and processor 3 produces f –r . The union of the results from all processors gives the ﬁnal query result. Both scenarios will produce the same query result. The only difference lies in the partitioning method used in the join algorithm. It is clear from the examples that the division should be on the larger class, whereas the broadcast should be on the smaller class, so that the cost due to the replication will be smaller. Another way to minimize replication is to use a variant of divide and broadcast called “divide and partial broadcast”. The name itself indicates that broadcasting is done partially, instead of completely.
8. 8.5 Parallel Collection-Intersect Join Algorithms 237 PARTIAL BROADCAST DIVIDE Class A Class B (Divide based on the smallest) (Divide based on the largest) Partition A1 a(250, 75) Partition B1 r(50, 40) (range 0-99) d(4, 237) (range 0-99) t(50,60) f(150, 50, 250) u(3, 1, 2) i(80, 70) w(80, 70) Partition A2 b(210, 123) Partition B2 (range 100-199) (range 100-199) s(125,180) c(125, 181) h(190, 189, 170) Partition B3 p(123, 210) Partition A3 e(289, 290) (range 200-299) q(237) (range 200-299) g(270) v(100, 102,270) Processor 1 : Class A Class B Partition A1 Partition B1 Objects: a, d, f, i Objects: r, t, u, w Processor 2 : Class A Class B Partition A1 Partition B2 Objects: a, d, f, i Object: s Class A Partition A2 Objects: b, c, h Processor 3 : Class A Class B Partition A1 Partition B3 Objects: a, d, f, i Objects: p, q, v Class A Partition A2 Objects: b, c, h Class A Partition A3 Objects: e, q Figure 8.13 Divide and partial broadcast example B and partial broadcast A). This is then called a “two-way” divide and partial broadcast. Figure 8.14(a and b) shows the results of reverse partitioning of the initial partitioning. Note that from processor 1, class A and class B are divided into three
9. 238 1. DIVIDE From Processor 1 From Processor 2 From Processor 3 Partition A11 Partition B11 Partition A21 Partition B21 Partition A31 Partition B31 i(80, 70) i(80, 70) i(80, 70) r(50, 40) t(50, 60) Partition A22 Partition B22 Partition A32 Partition B32 u(3, 1, 2) Partition A12 w(80, 70) c(125, 181) s(125, 180) c(125, 181) v(100, 102, 104) p(123, 210) h(190, 189, 170) h(190, 189, 170) Partition A13 Partition B12 Partition A23 Partition A33 a(250, 75) a(250, 75) a(250, 75) d(4, 237) Partition B13 b(210, 123) Partition B23 d(4, 237) Partition B33 f(150, 50, 250) d(4, 237) f(150, 50, 250) q(237) f(150, 50, 250) b(210, 123) e(289, 290) g(270) Figure 8.14(a) Two-way divide and partial broadcast (divide)
10. 2. PARTIAL BROADCAST From Processor 1 From Processor 2 From Processor 3 Bucket 11 Bucket 21 Bucket 31 Partition A11 Partition B11 Partition A21 Partition B21 Partition A31 Partition B31 Bucket 12 Bucket 22 Bucket 32 Partition A12 Partition B11 Partition A22 Partition B21 Partition A32 Partition B31 Partition B12 Partition B22 Partition B32 Bucket 13 Bucket 23 Bucket 33 Partition A13 Partition B11 Partition A23 Partition B21 Partition A33 Partition B31 Partition B12 Partition B22 Partition B32 Partition B13 Partition B23 Partition B33 Figure 8.14(b) Two-way divide and partial broadcast (partial broadcast) 239