A practical external sort for shared disk MPP's

An external sort has been implemented and analyzed for a shared disk MPP computer system. In this implementation, we have considered many real world constraints. Decision support functionality in database systems, for instance, often requires that external sorting be done in place on disk, support variable length records, and be restartable from any point of interruption with no loss of data. These three constraints, along with the more standard requirements of speed and stability, affect the choice and implementation of the external sorting algorithm. The implementation of the sample sort algorithm described here meets these requirements. Although written using high level file processing directives, the implementation sorts a 10 GB file in 1.5 h on a 64 processor Connection Machine CM-5 with a DataVault disk system.

[1]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[2]  Kevin Wilkinson,et al.  Sorting Large Files on a Backend Multiprocessor , 1988, IEEE Trans. Computers.

[3]  John H. Reif,et al.  Implementations of randomized sorting on large parallel machines , 1992, SPAA '92.

[4]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[5]  David J. DeWitt,et al.  Parallel sorting on a shared-nothing architecture using probabilistic splitting , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[6]  Paolo Carnevali Timing results of some internal sorting algorithms on the IBM 3090 , 1988, Parallel Comput..

[7]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[8]  Leonardo Dagum Parallel Integer Sorting with Medium and Fine-Scale Parallelism , 1993, Int. J. High Speed Comput..

[9]  Wlodzimierz Dobosiewicz,et al.  Sorting by Distributive Partitioning , 1978, Inf. Process. Lett..

[10]  Stephen J. Smith,et al.  An improved supercomputer sorting benchmark , 1992, Proceedings Supercomputing '92.

[11]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[12]  Michael L. Best,et al.  CMMD I/O: a parallel Unix I/O , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[13]  Jan F. Prins,et al.  Parallel sorting of large arrays on the MasPar MP-1 , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[14]  Bjørn Arild W. Baugstø,et al.  Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer , 1989, IWDM.

[15]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[16]  Honesty C. Young,et al.  A Low Communication Sort Algorithm for a Parallel Database Machine , 1989, VLDB.

[17]  Jeffrey Scott Vitter,et al.  Greed Sort: An Optimal External Sorting Algorithm for Multiple Disks , 1991 .

[18]  Thomas H. Cormen Fast Permuting on Disk Arrays , 1993, J. Parallel Distributed Comput..