Sorting on a Massively Parallel System Using a Library of Basic Primitives: Modeling and Experimental Results

We present a comparative study of implementations of the following sorting algorithms on the Parsytec SC320 reconfigurable, asynchronous, massively parallel MIMD machine: Bitonic Sort, Odd-Even Merge Sort, Odd-Even Merge Sort with guarded splits merge, and two variants of Samplesort. The experiments are performed on 2- up to 5-dimensional wrapped butterfly networks with 8 up to 160 processors. We make use of library functions that provide primitives for global variables and synchronization, and we show that it is possible to implement efficient and portable programs easily. In order to predict the performance, we model the runtime of an access to a global variable by a certain trilinear function and the runtime of a synchronization by a certain bilinear function. Our experiments show that, in the context of parallel sorting, this model that can be applied easily is sufficiently detailed to give good runtime predictions. The experiments confirming the predictions point out that Odd-Even Merge Sort with guarded splits merge is the fastest method if the processors hold few keys. If there are many keys per processor, a combination of Samplesort and Odd-Even Merge Sort is the fastest method.

[1]  Michael T. GoodrichyDept Communication-eecient Parallel Sorting , 1996 .

[2]  Michael T. Goodrich,et al.  Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..

[3]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[4]  Christine Rüb On the Average Running Time of Odd-Even Merge Sort , 1997, J. Algorithms.

[5]  Christine Rüb On the Average Running Time of Odd-Even Merge Sort , 1995, STACS.

[6]  Alf Wachsmann,et al.  OCCAM-light - A Multiparadigm Programming Language for Transputer Networks , 1993 .

[7]  A. J. Hey,et al.  Portability and Performance for Parallel Processing , 1994 .

[8]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[9]  Michael E. Saks,et al.  The periodic balanced sorting network , 1989, JACM.

[10]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[11]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1994, J. Parallel Distributed Comput..

[12]  V. Govindan,et al.  Parallel sorting on the NEC Cenju-3 and IBM SP2 , 1997, Proceedings High Performance Computing on the Information Superhighway. HPC Asia '97.

[13]  Michael T. Goodrich,et al.  Communication-efficient parallel sorting (preliminary version) , 1996, STOC '96.

[14]  Thomas Stricker Supporting the hypercube programming model on mesh architectures: (a fast sorter for iWarp tori) , 1992, SPAA '92.

[15]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[16]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[17]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[18]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[19]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[20]  Andrea C. Arpaci-Dusseau,et al.  Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..

[21]  Rolf Wanka,et al.  Efficient oblivious parallel sorting on the MasPar MP-1 , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[22]  Stephen,et al.  An Evaluation of Sorting as a Supercomputer Benchmark (preliminary Version) , 1993 .

[23]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[24]  B. Monien,et al.  An optimized reconfigurable architecture for transputer networks , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[25]  Rolf Wanka,et al.  Sorting large data sets on a massively parallel system , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[26]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[27]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[28]  Frank Thomson Leighton Introduction to parallel algorithms and architectures: arrays , 1992 .

[29]  Jochen Rethmann,et al.  A realistic cost model for the communication time in parallel programs , 1997 .

[30]  Frank Thomson Leighton,et al.  Tight Bounds on the Complexity of Parallel Sorting , 1984, IEEE Transactions on Computers.