Parallel Merge Sort with Load Balancing

Parallel merge sort is useful for sorting a large quantity of data progressively. The merge sort should be parallelized carefully since the conventional algorithm has poor performance due to the successive reduction of the number of participating processors by half, and down to one in the last merging stage. The proposed load-balanced merge sort utilizes all processors throughout the computation. It evenly distributes data to all processors in each stage. Thus every processor is forced to work in all phases. Significant performance enhancement has been achieved up to a speedup of (P−1)/log P where P is the number of processors. Experimental results demonstrate a speedup of 9.6 (upper bound of 10.7) on 32-processor Cray T3E when sorting 4M 32-bit integers, and a speed up of 2.3 (upper bound of 2.8) on an 8-node PC cluster.

[1]  Andrew Sohn,et al.  Partitioned Parallel Radix Sort , 2000, J. Parallel Distributed Comput..

[2]  Andrew Sohn,et al.  Communication-efficient bitonic sort on a distributed memory parallel computer , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[3]  Roger W. Hockney,et al.  Performance parameters and benchmarking of supercomputers , 1991, Parallel Comput..

[4]  Andrew Sohn,et al.  Load balanced parallel radix sort , 1998, ICS '98.

[5]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[6]  Andrea C. Arpaci-Dusseau,et al.  Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..

[7]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[8]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).