Partitioned Parallel Radix Sort

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. Redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance due to key redistribution. We present in this report a new parallel radix sorter that solves the communication problem of balanced radix sort, called partitioned parallel radix sort. The new method reduces the communication time by eliminating the redistribution steps. The keys are first sorted in a top-down fashion (left-to-right as opposed to right-to-left) by using some most significant bits. Once the keys are localized to each processor, the rest of sorting is confined within each processor, hence eliminating the need for global redistribution of keys. It enables well balanced communication and computation across processors. The proposed method has been implemented in three different distributedmemory platforms, including IBM SP2, CRAY T3E, and PC Cluster. Experimental results with various key distributions indicate that partitioned parallel radix sort indeed shows significant improvements over balanced radix sort. IBM SP2 shows 13% to 30% improvement while Cray/SGIT3E does 20% to 100% in execution time. PC cluster shows over 2.5 fold improvement in execution time.

[1]  Frank Thomson Leighton,et al.  Tight Bounds on the Complexity of Parallel Sorting , 1985, IEEE Trans. Computers.

[2]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[3]  Andrea C. Arpaci-Dusseau,et al.  Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..

[4]  Andrew Sohn,et al.  Load balanced parallel radix sort , 1998, ICS '98.

[5]  Kenneth E. Batcher,et al.  Minimizing Communication in the Bitonic Sort , 2000, IEEE Trans. Parallel Distributed Syst..

[6]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[7]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[8]  David A. Bader,et al.  Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract) , 1996, SPAA '96.

[9]  Frank Thomson Leighton,et al.  Wafer-Scale Integration of Systolic Arrays , 1985, IEEE Trans. Computers.

[10]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[11]  Mitsuhisa Sato,et al.  Identifying the capability of overlapping computation with communication , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[12]  Andrew Sohn,et al.  Partitioned Parallel Radix Sort , 2000, J. Parallel Distributed Comput..

[13]  John Gill,et al.  Sorting n Objects with a K-Sorter , 1990, IEEE Trans. Computers.

[14]  F. Thomson Leighton,et al.  ARRAYS AND TREES , 1992 .