Thuật toán Algorithms (Phần 14)

Chia sẻ: Tran Anh Phuong | Ngày: | Loại File: PDF | Số trang:10

Thêm vào BST

Báo xấu

73
lượt xem 5
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Tham khảo tài liệu 'thuật toán algorithms (phần 14)', khoa học tự nhiên, toán học phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Thuật toán Algorithms (Phần 14)

RADIX SORTING 123 have straight radix sort, the rightrto-left l&by-bit radix sort described in the example above. The implementation above moves the file from a to t during each dis- tribution counting phase, then back to a in a simple loop. This “array copy” loop could be eliminated if desired by making two copies of the distribution counting code, one to sort from a into t, the other to sort from t into a. A Linear Sort The straight radix sort implementation given in the previous section makes b/m passes through the file. By making rr: large, we get a very efficient sorting method, as long as we have M = 2m words of memory available. A reasonable choice is to make m about one-fourth th,e word-size (b/4), so that the radix sort is four distribution counting passes. The keys are treated as base-M numbers, and each (base--M) digit of each key is examined, but there are only four digits per key. (This directly corresponds with the architectural organization of many computers: one typical organization is to have 32-bit words, each consisting of four g-bit bytes. The bits procedure then winds up extracting particular bytes from words in this case, which obviously can be done very efficiently on such computers.) Now, each distribution counting pass is linear, and since there are only four of them, the entire sort is linear, certainly the best performance we could hope for in a sort. In fact, it turns out that we can get bj, with only two distribution counting passes. (Even a careful reader is likely ‘10 have difficulty telling right from left by this time, so some caution is called for in trying to understand this method.) This can be achieved by taking advantage of the fact that the file will be almost sorted if only the leading b,‘2 bits of the bbit keys are used. As with Quicksort, the sort can be completed efficiently by using insertion sort on the whole file afterwards. This method is obviously a trivial modification to the implementation above: to do a right-to-left sort using the leading half of the keys, we simply start the outer loop at pass=b div (2*m) rather than pass=l. Then a conventional insertion sol-t can be used on the nearly-ordered file that results. To become convinced that a file sorted on its leading bits is quite well-ordered, the reader should examine the first few columns of the table for radix exchange sort above. For example, insertion sort run on the the file sorted on the first three bits would require only six exchanges. Using two distribution counting passes (with m about one-fourth the word size), then using insertion sort to finish ;he job will yield a sorting method that is likely to run faster than any of the others that we’ve seen for large files whose keys are random bits. Its main disal,dvantage is that it requires an extra array of the same size as the array being sorted. It is possible to eliminate the extra array using linked-list techniquies, but extra space proportional to N (for the links) is still required.
CHAPTER 10 A linear sort is obviously desirable for many applications, but there are reasons why it is not the panacea that it might seem. First, it really does depend on the keys being random bits, randomly ordered. If this condition is not sati.sfied, severely degraded performance is likely. Second, it requires extra space proportional the size of the array being sorted. Third, the “inner loop” of the program actually contains quite a few instructions, so even though it’s linear, it won’t be as much faster than Quicksort (say) as one might expect, except for quite large files (at which point the extra array becomes a real liability). The choice between Quicksort and radix sort is a difficult one that is likely to depend not only on features of the application such as key, record, and file size, but also on features of the programming and machine environment that relate to the efficiency of access and use of individual bits. Again, such tradeoffs need to be studied by an expert and this type of study is likely to be worthwhile only for serious sorting applications.
RADlX SORTING 125 Exercises 1. Compare the number of exchanges used by radix exchange sort with the number of exchanges used by Qlricksort for the file 001,011,101,110, 000,001,010,111,110,010. 2. Why is it not as important to remove the recursion from the radix ex- change sort as it was for Quicksort? 3. Modify radix exchange sort to skip leading bits which are identical on all keys. In what situations would this be worthwhile? 4. True or false: the running time of sti,aight radix sort does not depend on the order of the keys in the input file. Explain your answer. 5. Which method is likely to be faste-* for a file of all equal keys: radix exchange sort or straight radix sort? 6. True or false: both radix exchange sort and straight radix sort examine all the bits of all the keys in the file. Explain your answer. 7. Aside from the extra memory reqciirement, what is the major disad- vantage to the strategy of doing straight radix sorting on the leading bits of the keys, then cleaning up with insertion sort afterwards? 8. Exactly how much memory is requirl:d to do a 4-pass straight radix sort of N b-bit keys? 9. What type of input file will make radix exchange sort run the most slowly (for very large N)? 10. Empirically compare straight radix sort with radix exchange sort for a random file of 1000 32-bit keys.
11. Priority Queues In many applications, records with keys must be processed in order, but not necessarily in full sorted order and not necessarily all at once. Often a set of records must be collected, then the largest processed, then perhaps more records collected, then the next largest processed, and so forth. An appropriate data structure in such an environment is one which supports the operations of inserting a new element and deleting the largest element. This can be contrasted with queues (delete the oldest) and stacks (delete the newest). Such a data structure is called a priority queue. In fact, the priority queue might be thought of as a generalization of the stack and the queue (and other simple data structures), since these data structures can be implemented with priority queues, using appropriate priority assignments. Applications of priority queues include simulation systems (where the keys might correspond to “event times” which must be processed in order), job scheduling in computer systems (where the keys might correspond to “priorities” which indicate which users should be processed first), and numeri- cal computations (where the keys might be computational errors, so the largest can be worked on first). Later on in this book, we’ll see how to use priority queues as basic building blocks for more advanced algorithms. In Chapter 22, we’ll develop a file compression algorithm using routines from this chapter, and in Chapters 31 and 33, we’ll see how priority queues can serve as the basis for several fundamental graph searching algorithms. These are but a few examples of the important role served by the priority queue as a basic tool in algorithm design. It is useful to be somewhat more precise about how a priority queue will be manipulated, since there are several operations we may need to perform on priority queues in order to maintain them and use them effectively for applications such as those mentioned above. Indeed, the main reason that 127
128 CHAPTER 11 priority queues are so useful is their flexibility in allowing a variety of different operations to be efficiently performed on set of records with keys. We want to build and maintain a data structure containing records with numerical keys (priorities), supp or t’mg some of the following operations: Construct a priority queue from N given items. Insert a new item. Remove the largest item. Replace the largest item with a new item (unless the new item is larger). Change the priority of an item. Delete an arbitrary specified item. Join two priority queues into one large one. (If records can have duplicate keys, we take “largest” to mean “any record with the largest key value.“) The replace operation is almost equivalent to an insert followed by a remove (the difference being that the insert/remove requires the priority queue to grow temporarily by one element). Note that this is quite different from doing a remove followed by an insert. This is included as a separate capability because, as we will see, some implementations of priority queues can do the replace operation quite efficiently. Similarly, the change operation could be implemented as a delete followed by an insert and the construct could be imple- mented with repeated uses of the insert operation, but these operations can be directly implemented more efficiently for some choices of data structure. The join operation requires quite advanced data structures for efficient implemen- tation; we’ll concentrate instead on a “classical” data structure, called a heap, which allows efficient implementations of the first five operations. The priority queue as described above is an excellent example of an abstract data structure: it is very well defined in terms of the operations performed on it, independent of the way the data is organized and processed in any particular implementation. The basic premise of an abstract data structure is that nothing outside of the definitions of the data structure and the algorithms operating on it should refer to anything inside, except through function and procedure calls for the fundamental operations. The main motivation for the development of abstract data structures has been as a mechanism for organizing large programs. They provide a way to limit the size and complexity of the interface between (potentially complicated) algorithms a.nd associated data structures and (a potentially large number of) programs which use the algorithms and data structures. This makes it easier to understand the large program, and makes it more convenient to change or improve the fundamental algorithms. For example, in the present
PRIORITY QUEUES 129 context, there are several methods for implementing the various operations listed above that can have quite different performance characteristics. Defining priority queues in terms of operations on an abstract data structure provides the flexibility necessary to allow experimentation with various alternatives. Different implementations of priority queues involve different performance characteristics for the various operations to be performed, leading to cost tradeoffs. Indeed, performance differences are really the only differences al- lowed by the abstract data structure concept. First, we’ll illustrate this point by examining a few elementary data structures for implementing priority queues. Next, we’ll examine a more advanced data structure, and then show how the various operations can be implemented efficiently using this data structure. Also, we’ll examine an important sorting algorithm that follows naturally from these implementations. Elementary Implementations One way to organize a priority queue is as an unordered list, simply keeping the items in an array a[l..N] without paying attention to the keys. Thus construct is a “no-op” for this organization. To insert simply increment N and put the new item into a[N], a constant-time operation. But replace requires scanning through the array to find the element with the largest key, which takes linear time (all the elements in the array must be examined). Then remove can be implemented by exchanging a[N] with the element with the largest key and decrementing N. Another organization is to use a sorted list, again using an array a [1..N] but keeping the items in increasing order of their keys. Now remove simply involves returning a[N] and decrementing N (constant time), but insert in- volves moving larger elements in the array right one position, which could take linear time. Linked lists could also be used for the unordered list or the sorted list. This wouldn’t change the fundamental performance characteristics for insert, remove, or replace, but it would make it possible to do delete and join in constant time. Any priority queue algorithm can be turned into a sorting algorithm by successively using insert to build a priority queue containing all the items to be sorted, then successively using remove to empty the priority queue, receiving the items in reverse order. Using a priority queue represented as an unordered list in this way corresponds to selection sort; using the sorted list corresponds to insertion sort. As usual, it is wise to keep these simple implementations in mind because they can outperform more complicated methods in many practical situations. For example, the first method might be appropriate in an application where
130 CRAPTER 11 only a few “remove largest” operations are performed as opposed to a large number of insertions, while the second method would be appropriate if the items inserted always tended to be close to the largest element in the priority queue. Implementations of methods similar to these for the searching problem (find a record with a given key) are given in Chapter 14. Heap Data Structure The data structure that we’ll use to support the priority queue operations involves storing the records in an array in such a way that each key is guaranteed to be larger than the keys at two other specific positions. In turn, each of those keys must be larger than two more keys, and so forth. This ordering is very easy to see if we draw the array in a two-dimensional “tree” structure with lines down from each key to the two keys known to be smaller. This structure is called a “complete binary tree”: place one node (called the root), then, proceeding down the page and from left to right, connect two nodes beneath each node on the previous level until N nodes have been placed. The nodes below each node are called its sons; the node above each node is called its father. (We’ll see other kinds of “binary trees” and “trees” in Chapter 14 and later chapters of this book.) Now, we want the keys in the tree to satisfy the heap condition: the key in each node should be larger than (or equal to) the keys in its sons (if it has any). Note that this implies in particular that the largest key is in the root. We can represent complete binary trees sequentially within an array by simply putting the root at position 1, its sons at positions 2 and 3, the nodes at the next level in positions 4, 5,6 and 7, etc., as numbered in the diagram above. For example, the array representation for the tree above is the following: 1 2 3 4 5 6 7 8 9 10 11 12 X T O G S M N A E R A I
PRIORITY QUEUES 131 This natural representation is useful because it is very easy to get from a node to its father and sons. The father of the node in position j is in position j div 2, and, conversely, the two sons of the node in position j are in position 2j and 2j + 1. This makes t,raversal of such a tree even easier than if the tree were implemented with a standard linked representation (with each element containing a pointer to its father and sons). The rigid structure of complete binary trees represented as arrays does limit their utility as data structures, but there is just enough flexibility to allow the implementation of efficient priority queue algorithms. A heap is a complete binary tree, represented as an array, in which every node satisfies the heap condition. In particular, the largest key is always in the first position in the array. All of the algorithms operate along some path from the root to the bottom of the heap (just moving from father to son or from son to father). It is easy to see that, in a heap of N nodes, all paths have about 1gN nodes on them. (There are about N/2 nodes on the bottom, N/4 nodes with sons on the bottom, N/8 nodes with grandsons on the bottom, etc. Each “generation” has about half as many nodes as the next, which implies that there can be at most lg N generations.) Thus all of the priority queue operations (except join) can be done in logarithmic time using heaps. Algorithms on Heaps The priority queue algorithms on heaps all work by first making a simple structural modification which could violate the heap condition, then traveling through the heap modifying it to ensure that the heap condition is satisfied everywhere. Some of the algorithms travel through the heap from bottom to top, others from top to bottom. In all of the algorithms, we’ll assume that the records are one-word integer keys stored in an array a of some maximum size, with the current size of the heap kept in an integer N. Note that N is as much a part of the definition of the heap as the keys and records themselves. To be able to build a heap, it is necessary first to implement the insert operation. Since this operation will increase the size of the heap by one, N must be incremented. Then the record to be inserted is put into a[N], but this may violate the heap property. If the heap property is violated (the new node is greater than its father), then the violation can be fixed by exchanging the new node with its father. This may, in turn, cause a violation, and thus can be fixed in the same way. For example, if P is to be inserted in the heap above, it is first stored in a[N] as the right son of M. Then, since it is greater than M, it is exchanged with M, and since it is greater than 0, it is exchanged with 0, and the process terminates since it is less that X. The following heap results:
132 CHAPTER 11 The code for this method is straightforward. In the following implementation, insert adds a new item to a[N], then calls upheap to fix the heap condition violation at N procedure upheap(k: integer); var v: integer; begin v:=a[k]; a[O]:=maxint; while a[k div 21