The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints

Prefix computation is a basic operation at the core of many important applications, e.g., some of the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry. In this paper, we present new and strict time-optimal parallel schedules for prefix computation with resource constraints under the concurrent-read-exclusive-write (CREW) parallel random access machine (PRAM) model. For prefix of N elements on p processors (p independent of N) when N>p(p+1)/2, we derive Harmonic Schedules that achieve the strict optimal time (steps), [2(N-1)/(p+1)]. We also derive Pipelined Schedules that have better program-space efficiency than the Harmonic Schedule, yet only require a small constant number of steps more than the optimal time achieved by the Harmonic Schedule, Both the Harmonic Schedules and the Pipelined Schedules are simple and easy to implement. For prefix of N elements on p processors (p independent of N) where N/spl les/p(p+1)/2, the Harmonic Schedules are not time-optimal. For these cases, we establish an optimization method for determining key parameters of time-optimal schedules, based on connections between the structure of parallel prefix and Pascal's triangle. Using the derived parameters, we devise an algorithm to construct such schedules. For a restricted class of values of N and p, we prove that the constructed schedules are strictly time-optimal. We also give strong empirical evidence that our algorithm constructs strict time optimal schedules for all cases where N/spl les/p(p+1)/2.

[1]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[2]  Faith E. Fich,et al.  New Bounds for Parallel Prefix Circuits , 1983, STOC.

[3]  William Gropp,et al.  Computational fluid dynamics on parallel processors , 1987 .

[4]  William Gropp,et al.  Computational fluid dynamics on parallel processors , 1987 .

[5]  S StoneHarold,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973 .

[6]  Alexandru Nicolau,et al.  Parallelization of programs containing loop-carried dependences with resource constraints , 1994 .

[7]  Marc Snir,et al.  Depth-Size Trade-Offs for Parallel Prefix Computation , 1986, J. Algorithms.

[8]  Alexandru Nicolau,et al.  Optimal schedules for parallel prefix computation with bounded resources , 1991, PPOPP '91.

[9]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[10]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[11]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[12]  Allan Gottlieb,et al.  Highly parallel computing , 1989, Benjamin/Cummings Series in computer science and engineering.

[13]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[14]  S. Lakshmivarahan,et al.  Parallel computing using the prefix problem , 1994 .

[15]  Larry Rudolph,et al.  The power of parallel prefix , 1985, IEEE Transactions on Computers.

[16]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[17]  C. L. Liu,et al.  Introduction to Combinatorial Mathematics. , 1971 .

[18]  J. C. Lewis,et al.  Computational fluid dynamics on parallel processors , 1992 .

[19]  Daniel Gajski,et al.  A Heuristic for Suffix Solutions , 1986, IEEE Transactions on Computers.

[20]  Çetin Kaya Koç,et al.  Parallel prefix computation with few processors , 1992 .

[21]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[22]  Yuri Petrovich Ofman,et al.  On the Algorithmic Complexity of Discrete Functions , 1962 .

[23]  Yoichi Muraoka,et al.  Parallelism exposure and exploitation in programs , 1971 .