Load-balanced parallel merge sort on distributed memory parallel computers

Sort can be speeded up on parallel computers by dividing and computing data individually in parallel. Merge sort can be parallelized, however, the conventional algorithm implemented on distributed memory computers has poor performance due to the successive reduction of the number of active (non-idling) processors by a half, up to one in the last merging stage. This paper presents load-balanced parallel merge sort algorithm where all processors participate in merging throughout the computation. Data are evenly distributed to all processors, and every processor is forced to work in merging phase. Significant enhancement of the performance has been achieved. Our analysis shows the upper bound of the speedup of the merge time as (P -1)= logP. We have had a speedup of 9.6 (upper bound is 10.5) on 32-processor Cray T3E in sorting of 4M 32-bit integers. The same idea can be applied to parallellize other sorting algorithms.

[1]  Roger W. Hockney,et al.  Performance parameters and benchmarking of supercomputers , 1991, Parallel Comput..

[2]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[3]  Theodore Brown,et al.  Parallel Median Splitting and k-Splitting with Application to Merging and Sorting , 1993, IEEE Trans. Parallel Distributed Syst..

[4]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[5]  Jonathan Schaeffer,et al.  Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[6]  Andrew Sohn,et al.  Communication-efficient bitonic sort on a distributed memory parallel computer , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[7]  Andrew Sohn,et al.  Partitioned Parallel Radix Sort , 2000, J. Parallel Distributed Comput..

[8]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[9]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[10]  Andrea C. Arpaci-Dusseau,et al.  Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..

[11]  Andrew Sohn,et al.  Load balanced parallel radix sort , 1998, ICS '98.

[12]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).