In this paper, we present a new parameterized parallel sort algorithm, called Round-Robin Partitioned (or RRP), for the message passing (shared-nothing) architecture. This is a parameterized sort algorithm because a parameter is provided which can be used to determine the amount of memory used and to allocate differing amounts of work to different sets of sites. We utilize pipelining to hide disk I/O time, exploit high degrees of parallelism at all phases, apply sampling to determine the partition key values and use less memory than previous known methods while repairing the minimum number of physical I/Os. The basic version of the RRP algorithm is simple in terms of coding and complexity. It does not require disk I/O parallelism or data prefetch within a single process. We develop an analytical model for our algorithm and compare our sort algorithm with four other classes of external parallel sort algorithms. The RRP algorithm are shown to be superior to the other algorithms for almost all configurations.<<ETX>>
[1]
Bjørn Arild W. Baugstø,et al.
Parallel Sorting Methods for Large Data Volumes on a Hypercube Database Computer
,
1989,
IWDM.
[2]
Honesty C. Young,et al.
A Low Communication Sort Algorithm for a Parallel Database Machine
,
1989,
VLDB.
[3]
David J. DeWitt,et al.
Parallel sorting on a shared-nothing architecture using probabilistic splitting
,
1991,
[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.
[4]
Jim Gray,et al.
FastSort: a distributed single-input single-output external sort
,
1990,
SIGMOD '90.
[5]
Kevin Wilkinson,et al.
Sorting Large Files on a Backend Multiprocessor
,
1988,
IEEE Trans. Computers.