A Scalable Parallel Sorting Algorithm Using Exact Splitting

Sorting is one of the most fundamental algorithmic kernels, used by a large fraction of computer applications. This paper proposes a novel parallel sorting algorithm based on exact splitting that combines excellent scaling behavior with universal applicability. In contrast to many existing parallel sorting algorithms that make limiting assumptions regarding the input problem or the underlying computation model, our general-purpose algorithm can be used without restrictions on any MIMD-class computer architecture, demonstrating its full potential on massively parallel systems with distributed memory. It is comparison-based like most sequential sorting algorithms, handles an arbitrary number of keys per processing element, works in a deterministic way, does not fail in the presence of duplicate keys, minimizes the communication bandwidth requirements, does not require any knowledge of the key-value distribution, and uses only a small and a priori known amount of additional memory. Moreover, our algorithm can be turned into a stable sort without altering the time complexity, and can be made work in place. The total running time for sorting n elements on p processors is O( n log n + plog 2 n). Practical scalability is shown using more than thirty thousand compute nodes. This paper presents the first parallel sorting algorithm to combine all herein before mentioned properties, while laying the foundations to overcome scalability problems for sorting data on the next generation of massively parallel systems.

[1]  David K. Hsiao,et al.  Parallel Record-Sorting Methods for Hardware Realization. , 1980 .

[2]  Jonathan Schaeffer,et al.  Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[3]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[4]  Gérard M. Baudet,et al.  Optimal Sorting Algorithms for Parallel Computers , 1978, IEEE Transactions on Computers.

[5]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[6]  Laxmikant V. Kalé,et al.  A Comparison Based Parallel Sorting Algorithm , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[7]  Hussein M. Alnuweiri A New Class of Optimal Bounded-Degree VLSI Sorting Networks , 1993, IEEE Trans. Computers.

[8]  Peter Sanders,et al.  Asynchronous parallel disk sorting , 2003, SPAA '03.

[9]  Viral B. Shah,et al.  A Novel Parallel Sorting Algorithm for Contemporary Architectures , 2007 .

[10]  Yijie Han Optimal parallel selection , 2003, SODA '03.

[11]  Andrew R. Siegel,et al.  Madre: the Memory-Aware Data Redistribution Engine , 2008, Int. J. High Perform. Comput. Appl..

[12]  Philip Heidelberger,et al.  Optimization of All-to-All Communication on the Blue Gene/L Supercomputer , 2008, 2008 37th International Conference on Parallel Processing.

[13]  Gudula Rünger,et al.  An In-Place Algorithm for Irregular All-to-All Communication with Limited Memory , 2010, EuroMPI.

[14]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[15]  Laxmikant V. Kalé,et al.  Highly scalable parallel sorting , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[16]  Daniel S. Hirschberg Fast parallel sorting algorithms , 1978, CACM.

[17]  Michael A. Langston,et al.  Fast Stable Merging and Sorting in Constant Extra Space , 1992, Comput. J..

[18]  David J. DeWitt,et al.  A taxonomy of parallel sorting , 1984, CSUR.

[19]  Michael Garland,et al.  Designing efficient sorting algorithms for manycore GPUs , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[21]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[22]  Srinivas Aluru,et al.  Parallel domain decomposition and load balancing using space-filling curves , 1997, Proceedings Fourth International Conference on High-Performance Computing.