Parallel sorting on a shared-nothing architecture using probabilistic splitting

The authors consider the problem of external sorting in a shared-nothing multiprocessor. A critical step in the algorithms the authors consider is to determine the range of sort keys to be handled by each processor. They consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles. They present analytic results showing that probabilistic splitting performs better than exact splitting. Finally, the authors present experimental results from an implementation of sorting probabilistic splitting in the Gamma parallel database machine.<<ETX>>

[1]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[2]  Wlodzimierz Dobosiewicz,et al.  Sorting by Distributive Partitioning , 1978, Inf. Process. Lett..

[3]  Jim Gray,et al.  The convoy phenomenon , 1979, OPSR.

[4]  David J. DeWitt,et al.  Parallel algorithms for the execution of relational database operations , 1983, TODS.

[5]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[6]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[7]  David J. DeWitt,et al.  Design and implementation of the wisconsin storage system , 1985, Softw. Pract. Exp..

[8]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[9]  Edmund A. Lamagna,et al.  An Adaptive Method for Unknown Distributions in Distributive Partitioned Sorting , 1985, IEEE Transactions on Computers.

[10]  Miron Livny,et al.  Multi-disk management algorithms , 1987, SIGMETRICS '87.

[11]  Yasuo Yamane,et al.  Parallel Partition Sort for Database Machines , 1987, IWDM.

[12]  Kevin Wilkinson,et al.  Sorting Large Files on a Backend Multiprocessor , 1988, IEEE Trans. Computers.

[13]  Michael J. Quinn Parallel sorting algorithms for tightly coupled multiprocessors , 1988, Parallel Comput..

[14]  Bjørn Arild W. Baugstø,et al.  Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer , 1989, IWDM.

[15]  Peter J. Varman,et al.  Percentile Finding Algorithm for Multiple Sorted Runs , 1989, VLDB.

[16]  Goetz Graefe Parallel external sorting in volcano , 1989 .

[17]  Honesty C. Young,et al.  A Low Communication Sort Algorithm for a Parallel Database Machine , 1989, VLDB.

[18]  David J. DeWitt,et al.  Parallel database systems: the future of database processing or a passing fad? , 1990, SGMD.

[19]  Jim Gray,et al.  A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases , 1990, SIGMETRICS '90.

[20]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[21]  Jim Gray,et al.  FastSort: a distributed single-input single-output external sort , 1990, SIGMOD '90.

[22]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[23]  Jeffrey F. Naughton,et al.  Sampling Issues in Parallel Database Systems , 1992, EDBT.