Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit

Given n values x1, x2,...,xn and an associative binary operation ⊗, the prefix problem is to compute x1⊗x2⊗⋯⊗xi, 1≤i≤n. Prefix circuits are combinational circuits for solving the prefix problem. For any n-input prefix circuit D with depth d and size s, if d+s=2n−2, then D is depth-size optimal. In general, a prefix circuit with a small depth is faster than one with a large depth. For prefix circuits with the same depth, a prefix circuit with a smaller fan-out occupies less area and is faster in VLSI implementation. This paper is on constructing parallel prefix circuits that are depth-size optimal with small depth and small fan-out. We construct a depth-size optimal prefix circuit H4 with fan-out 4. It has the smallest depth among all known depth-size optimal prefix circuits with a constant fan-out; furthermore, when n≥136, its depth is less than, or equal to, those of all known depth-size optimal prefix circuits with unlimited fan-out. A size lower bound of prefix circuits is also derived. Some properties related to depth-size optimality and size optimality are introduced; they are used to prove that H4 is depth-size optimal.

[1]  Faith E. Fich,et al.  New Bounds for Parallel Prefix Circuits , 1983, STOC.

[2]  Yen-Chun Lin,et al.  A New Class of Depth-Size Optimal Parallel Prefix Circuits , 2004, The Journal of Supercomputing.

[3]  Yen-Chun Lin,et al.  Finding Optimal Parallel Prefix Circuits with Fan-Out 2 in Constant Time , 1999, Inf. Process. Lett..

[4]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[5]  David A. Carlson,et al.  Limited width parallel prefix circuits , 1990, The Journal of Supercomputing.

[6]  Alexandru Nicolau,et al.  The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints , 1996, IEEE Trans. Computers.

[7]  Rajit Manohar,et al.  Asynchronous Parallel Prefix Computation , 1998, IEEE Trans. Computers.

[8]  Larry Rudolph,et al.  Parallel Prefix on Fully Connected Direct Connection Machines , 1986, ICPP.

[9]  Yen-Chun Lin Optimal Parallel Prefix Circuits with Fan-Ot 2 and Corresponding Parallel Algorithms , 1999, Neural Parallel Sci. Comput..

[10]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[11]  Yen-Chun Lin,et al.  Efficient Parallel Prefix Algorithms on Multicomputers , 2000, J. Inf. Sci. Eng..

[12]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[13]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[14]  Yen-Chun Lin,et al.  Efficient Parallel Prefix Algorithms on Multiport Message-Passing Systems , 1999, Inf. Process. Lett..

[15]  Neil Weste,et al.  Principles of CMOS VLSI Design , 1985 .

[16]  Daniel Gajski,et al.  A Heuristic for Suffix Solutions , 1986, IEEE Transactions on Computers.

[17]  S. Lakshmivarahan,et al.  On a New Class of Optimal Parallel Prefix Circuits with (Size+Depth) = 2n-2 AND\lceil log n \rceil DEPTH(2 \lceil log n \rceil -3) , 1987, ICPP.

[18]  Afonso Ferreira,et al.  Parallel complexity of the medial axis computation , 1995, Proceedings., International Conference on Image Processing.

[19]  Selim G. Akl Parallel computation: models and methods , 1997 .

[20]  Rajesh K. Mansharamani Parallel Computing Using the Prefix Problem , 1995 .

[21]  Anthony Skjellum,et al.  Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[22]  Marc Snir,et al.  Depth-Size Trade-Offs for Parallel Prefix Computation , 1986, J. Algorithms.