Pipelined Parallel Prefix Computations, and Sorting on a Pipelined Hypercube

This paper brings together a number of previously known techniques in order to obtain practical and efficient implementations of the prefix operation for the complete binary tree, hypercube and shuffle exchange families of networks. For each of these networks, we also provide a "pipelined" scheme for performing k prefix operations in O(k + log p) time on p processors. This implies a similar pipelining result for the "data distribution" operation of Ullman [16]. The data distribution primitive leads to a simplified implementation of the optimal merging algorithm of Varman and Doshi, which runs on a pipelined model of the hypercube [17]. Finally, a pipelined version of the multi-way merge sort of Nassimi and Sahni [10], running on the pipelined hypercube model, is described. Given p processors and n > p log p values to be sorted, the running time of the pipelined algorithm is O(log2 p/log((p log p)/n)). Note that for the interesting case n = p this yields a running time of 0(log2 p/log log p), which is asymptotically faster than Batcher''s bitonic sort[3].

[1]  Lennart Johnsson,et al.  Combining Parallel and Sequential Sorting on a Boolean n–cube , 1984 .

[2]  Sartaj Sahni,et al.  Parallel permutation and sorting algorithms and a new generalized connection network , 1982, JACM.

[3]  Leslie G. Valiant,et al.  A logarithmic time sort for linear size networks , 1982, STOC.

[4]  Jeffrey D Ullma Computational Aspects of VLSI , 1984 .

[5]  Richard J. Anderson,et al.  Parallel Approximation Algorithms for Bin Packing , 1988, Inf. Comput..

[6]  Arnold L. Rosenberg,et al.  Optimal simulations of tree machines , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[7]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[8]  Peter J. Varman,et al.  Sorting with Linear Speedup on a Pipelined Hypercube , 1992, IEEE Trans. Computers.

[9]  Gérard M. Baudet,et al.  Optimal Sorting Algorithms for Parallel Computers , 1978, IEEE Transactions on Computers.

[10]  C. Greg Plaxton,et al.  Deterministic sorting in nearly logarithmic time on the hypercube and related computers , 1990, STOC '90.

[11]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[12]  Alok Aggarwal,et al.  Network Complexity of Sorting and Graph Problems and Simulating CRCW PRAMS by Interconnection Networks , 1988, AWOC.

[13]  .. G. Plaxton Load Balancing , Selection and Sorting on the HypercubeC , 1989 .

[14]  S. R. Seidel,et al.  Binsorting on hypercubes with d-port communication , 1989, C3P.

[15]  S. Lennart Johnsson,et al.  Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes , 1986, ICPP.

[16]  Jorge L. C. Sanz,et al.  Cubesort: An Optimal Sorting Algorithm for Feasible Parallel Computers , 1988, AWOC.