Implementing the multiprefix operation on parallel and vector computers

For an ordered set of n values, each with an associated integer label, the multiprefix operation calculates a partial sum for each value that is the sum of all preceding values with the same label. The multiprefix operation has been proposed as a parallel primitive because of its power for expressing many data parallel algorithms succinctly. However, most approaches to implementing this operation have used integer sorting to gather elements with the same label together or have suggested special hardware. In this paper we present a work efficient algorithm for the multiprefix operation on n elements that runs in __ __ S = O(V n parallel steps on a p = V n processor CRCW-ARB PRAM. The CRCW-ARB model ensures only that of multiple processors writing to the same location, an arbitrary one succeeds. We make use of this feature to resolve data dependencies in the first phase of the algorithm only so that all later steps guarantee EREW memory access. A fully vectorized version of our algorithm has been designed for the CRAY Y-MP and provides good performance for a number of important algorithms. For the integer sorting test of the NAS benchmarks, our multiprefix operation was used to create an algorithm that is competitive in performance with the current best algorithms for that machine. As another example, we show that by using the multiprefix operator for sparse-matrix dense-vector multiplication, we obtain performance exceding traditional FORTRAN-based approaches. Finally, our algorithm also makes possible the simultion of a CRCW-PLUS PRAM on a p processor CRCW-ARB PRAM with only constant slowdown for problem 2 sizes n < p .

[1]  Uzi Vishkin,et al.  Implementation of Simultaneous Memory Address Access in Models That Forbid It , 1983, J. Algorithms.

[2]  Randal E. Bryant,et al.  An Analysis of Hashing on Parallel and Vector Computers , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[3]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[4]  Krzysztof Diks,et al.  Improved Deterministic Parallel Integer Sorting , 1991, Inf. Comput..

[5]  Allan Gottlieb,et al.  COORDINATING LARGE NUMBERS OF PROCESSORS. , 1981 .

[6]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[7]  Guy E. Blelloch,et al.  Scan primitives for vector computers , 1990, Proceedings SUPERCOMPUTING '90.

[8]  Y. Saad,et al.  Krylov Subspace Methods on Supercomputers , 1989 .

[9]  Uzi Vishkin,et al.  On Parallel Hashing and Integer Sorting (cid:3) , 1991 .

[10]  Yasusi Kanada A vectorization technique of hashing and its application to several sorting algorithms , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[11]  Sandeep N. Bhatt,et al.  The fluent abstract machine , 1988 .

[12]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[13]  W. Daniel Hillis,et al.  Connection Machine Lisp: fine-grained parallel symbolic processing , 1986, LFP '86.

[14]  Thomas G. Macdonald,et al.  MPP Fortran Programming Model , 1992 .

[15]  Rajeev Raman,et al.  The Power of Collision: Randomized Parallel Algorithms for Chaining and Integer Sorting , 1990, FSTTCS.

[16]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[17]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[18]  Evan R. Cohn,et al.  The Beta Operation: A Parallel Primitive , 1988 .

[19]  Sanguthevar Rajasekaran,et al.  Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms , 1989, SIAM J. Comput..

[20]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[21]  Torben Hagerup,et al.  Towards Optimal Parallel Bucket Sorting , 1987, Inf. Comput..

[22]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[23]  Bogdan S. Chlebus A Parallel Bucket Sort , 1988, Inf. Process. Lett..

[24]  Evan Reid Cohn Implementing the Multiprefix Operation Efficiently , 1990, J. Parallel Distributed Comput..

[25]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..