Communication-eecient Parallel Sorting

We study the problem of sorting n numbers on a p-processor bulk-synchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processor-to-processor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O(n log n p) and a number of communication rounds that is O(log n log(h+1)) for h = (n=p). The internal computation bound is optimal for any comparison-based sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1?1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the \or" of n bits distributed evenly to the rst O(n=h) of an arbitrary number of processors in a BSP computer requires (log n= log(h + 1)) communication rounds.

[1]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[2]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[3]  Sartaj Sahni,et al.  Parallel permutation and sorting algorithms and a new generalized connection network , 1982, JACM.

[4]  János Komlós,et al.  Sorting in c log n parallel sets , 1983, Comb..

[5]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[6]  David J. DeWitt,et al.  A taxonomy of parallel sorting , 1984, CSUR.

[7]  S. Lakshmivarahan,et al.  Parallel Sorting Algorithms , 1984, Adv. Comput..

[8]  Frank Thomson Leighton,et al.  Tight Bounds on the Complexity of Parallel Sorting , 1984, IEEE Transactions on Computers.

[9]  Stephen A. Cook,et al.  Upper and Lower Time Bounds for Parallel Random Access Machines without Simultaneous Writes , 1986, SIAM J. Comput..

[10]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[11]  Leslie G. Valiant,et al.  A logarithmic time sort for linear size networks , 1982, STOC.

[12]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[13]  Michael J. Quinn Analysis and benchmarking of two parallel sorting algorithms: Hyperquicksort and quickmerge , 1989, BIT Comput. Sci. Sect..

[14]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[15]  Ernst W. Mayr,et al.  Efficient computation on sparse interconnection networks , 1989 .

[16]  C. Greg Plaxton,et al.  Deterministic sorting in nearly logarithmic time on the hypercube and related computers , 1990, STOC '90.

[17]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[18]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[19]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[20]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[21]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[22]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[23]  Rhys S. Francis,et al.  A parallel partition for enhanced parallel QuickSort , 1992, Parallel Comput..

[24]  John H. Reif,et al.  Implementations of randomized sorting on large parallel machines , 1992, SPAA '92.

[25]  Jorge L. C. Sanz,et al.  Cubesort: A Parallel Algorithm for Sorting N Data Items with S-Sorters , 1992, J. Algorithms.

[26]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1992, J. Parallel Distributed Comput..

[27]  Vašek Chvátal Lecture Notes on the New AKS Sorting Network , 1992 .

[28]  Jonathan Schaeffer,et al.  Parallel Sorting by Regular Sampling , 1992, J. Parallel Distributed Comput..

[29]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[30]  John H. Reif,et al.  Synthesis of Parallel Algorithms , 1993 .

[31]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[32]  Andrew Rau-Chaplin,et al.  Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.

[33]  R. Karp,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[34]  Kenneth C. SevcikComputer Parallel Sorting by Overpartitioning , 1994 .

[35]  Trade-offs between communication throughput and parallel time , 1993, STOC '94.

[36]  Xiaotie Deng,et al.  A randomized parallel 3D convex hull algorithm for coarse grained multicomputers , 1995, SPAA '95.

[37]  Martin Dyer,et al.  Parallel algorithm design on the WPRAM model , 1995 .

[38]  Richard M. Karp,et al.  Parallel sorting with limited bandwidth , 1995, SPAA '95.

[39]  Franco P. Preparata,et al.  Lower Bounds to Processor-Time Tradeoffs under Bounded-Speed Message Propagation , 1995, WADS.

[40]  Michael T. Goodrich,et al.  Sorting on a parallel pointer machine with applications to set expression evaluation , 1996, JACM.

[41]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.