# Thuật toán Algorithms (Phần 16)

Chia sẻ: Tran Anh Phuong | Ngày: | Loại File: PDF | Số trang:10

0
75
lượt xem
15

## Thuật toán Algorithms (Phần 16)

Mô tả tài liệu

Tham khảo tài liệu 'thuật toán algorithms (phần 16)', khoa học tự nhiên, toán học phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Bình luận(0)

Lưu

## Nội dung Text: Thuật toán Algorithms (Phần 16)

1. 12. Selection and Merging Sorting programs are often used for applications in which a full sort is not necessary. Two important operations which are similar to sorting but can be done much more efficiently are selection, finding the kth smallest element (or finding the k smallest elements) in a file, and merging, combining two sorted files to make one larger sorted file. Selection and merging are intimately related to sorting, as we’ll see, and they have wide applicability in their own right. An example of selection is the process of finding the median of a set of numbers, say student test scores. An example of a situation where merging might be useful is to find such a statistic for a large class where the scores are divided up into a number of individually sorted sections. Selection and merging are complementary operations in the sense that selection splits a file into two independent files and merging joins two inde- pendent files to make one file. The relationship between these operations also becomes evident if one tries to apply the “divide-and-conquer” paradigm to create a sorting method. The file can either be rearranged so that when two parts are sorted the whole file is sorted, or broken into two parts to be sorted and then combined to make the sorted whole file. We’ve already seen what happens in the first instance: that’s Quicksort, which consists basically of a selection procedure followed by two recursive calls. Below, we’ll look at mer- gesort, Quicksort’s complement in that it consists basically of two recursive calls followed by a merging procedure. Both selection and merging are easier than sorting in the sense that their running time is essentially linear: the programs take time proportional to N when operating on N items. But available methods are not perfect in either case: the only known ways to merge in place (without using extra space) are too complex to be reduced to practical programs, as are the only known selection methods which are guaranteed to be linear even in the worst case. 143
2. 144 CHAPTER 12 Selection Selection has many applications in the processing of experimental and other data. The most prominent use is the special case mentioned above of finding the median element of a file: that item which is greater than half the items in the file and smaller than half the items in the file. The use of the median and other order statistics to divide a file up into smaller percentile groups is very common. Often only a small part of a large file is to be saved for further processing; in such cases, a program which can select, say, the top ten percent of the elements of the file might be more appropriate than a full sort. An algorithm for selection must find the kth smallest item out of a file of N items. Since an algorithm cannot guarantee that a particular item is the kth smallest without having examined and identified the k- 1 items which are smaller and the N - k elements which are larger, most selection algorithms can return all of the k smallest elements of a file without a great deal of extra calculation. We’ve already seen two algorithms which are suitable for direct adapta- tion to selection methods. If k is very small, then selection sort will work very well, requiring time proportional to Nk: first find the smallest element, then find the second smallest by finding the smallest among the remaining items, etc. For slightly larger k, priority queues provide a selection mechanism: first insert k items, then replace the largest N - k times using the remaining items, leaving the k smallest items in the priority queue. If heaps are used to imple- ment the priority queue, everything can be done in place, with an approximate running time proportional to N log k. An interesting method which will run much faster on the average can be formulated from the partitioning procedure used in Quicksort. Recall that Quicksort’s partitioning method rearranges an array a[l..N] and returns an integer i such that a[l],. . . ,a[i-l] are less than or equal to a[i] and a[i+l],. . . , a[N] are greater than or equal to a[i]. If we’re looking for the kth smallest element in the file, and we’re fortunate enough to have k=i, then we’re done. Otherwise, if ki then we need to look for the (k-i)th smallest element in the right subfile. This leads immediately to the recursive formulation:
3. SELECTION AAD MERGING 145 procedure seJect(J, r, k: integer); var I; begil 1 if r> J then begin i::=partition(J, r); if i>J+k-1 then seJect(J, i-l, k); if i
4. 146 CHAPTER 12 procedure select(k: integer) ; var v , t 7 9 t 1 ) r: integer; ij begin l:=l; r:=N; while r>l do begin v:=a[r]; i:=l-1; j : = r ; repeat repeat i:=i+l until a[i]>=v; repeat j:=j-1 until ab]
5. SELECTION AND MERGING 147 make it worthwhile to study. Also, we’ll examine a sorting method based on merging. In this chapter we’ll concentrate on programs for two-way merging: pro- grams which combine two sorted input files to make one sorted output file. In the next chapter, we’ll look in more deLii at multiway merging, when more than two files are involved. (The most important application of multiway merging is external sorting, the subject cf that chapter.) To begin, suppose that we have two sorted arrays a [l..M] and b [1..N] of integers which we wish to merge into a third array c [ 1. .M+N] . The following is a direct implementation of the obvious method of successively choosing for c the smallest remaining element from a and b: a[M+l I:=maxint; b[N+l] :=maxint; for k:=l to M+N do if a[il
6. CHAPTER 12 program listmerge(input, output); t y p e link=tnode; node=record k: integer; next: link end; var N, M: integer; z: link; function merge(a, b: link) : link; var c: link; begin c:=z; repeat if at.k
7. SELECTION AND MIERGING 149 (It is convenient to pass the list length as E. parameter to the recursive program: alternatively, it could be stored with the list or the program could scan the list to find its length.) function sort(c: link; fi: integer): link; var a, b: link; i: integer; begin if cf.next=z then sor~,:=c else begin a:=c; for i:= 2 to N div 2 do c:=ct.next; b:=cf.next; cf.next:=z; sort:=merge(sort(a N div 2), sort(b, N-(N div 2))); end ; end ; This program sorts by splitting the list po: nted to by c into two halves, pointed to by a and b, sorting the two halves recursively, then using merge to produce the final result. Again, this program adheres to the convention that all lists end with z: the input list must end with z (and therefore so does the b list); and the explicit instruction cf.next:=z puts z at the end of the a list. This program is quite simple to understand in a recursive formulation even though it actually is a rather sophisticated algorithm. The running time of the program fits the standard “divide-and-conquer” recurrence M(N) = 2M(N/2) + N. The program is thus guaranteed to run in time proportional to NlogN. (See Chapter 4). Our file of sample sorting keys is processed as shown in the following table. Each line in the table shows the result of a call on merge. First we merge 0 and S to get 0 S, then we merge this with A to get A 0 S. Eventually we merge R T with I N to get I N R T, then this is merged with A 0 S to get A I N 0 R S T, etc.:
8. 150 CHAPTER 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A S 0 R T I N G E X A M P L E 0 s A 0 S R T I N I N R T A I N O R S T E G A X A E G X M P E L E L M P A E E G L M P X A A E E G I L M N O P R S T X Thus, this method recursively builds up small sorted files into larger ones. Another version of mergesort processes the files in a slightly different order: first scan through the list performing l-by-l merges to produce sorted sublists of size 2, then scan through the list performing 2-by-2 merges to produce sorted sublists of size 4, then do 4-by-4 merges to get sorted sublists of size 8, etc., until the whole list is sorted. Our sample file is sorted in four passes using this “bottom-up” mergesort: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A S O R T I N G E X A M P L E A SIO R I TIG N E XIA M L PIE A O R S G I N T A E M X E L P A G I N O R S T A E E L M P X A A E E G I L M N O P R S T X In general, 1ogN passes are required to sort a file of N elements, since each pass doubles the size of the sorted subfiles. A detailed implementation of this idea is given below.