New parallel prefix algorithms

New families of computation-efficient parallel prefix algorithms for message-passing multicomputers are presented. The first family improves the communication time of a previous family of parallel prefix algorithms; both use only half-duplex communications. Two other families adopt collective communication operations to reduce the communication times of the former two, respectively. These families each provide the flexibility of either fewer computation time steps or fewer communication time steps to achieve the minimal running time depending on the ratio of the time required by a communication step to the time required by a computation step.

[1]  Larry Rudolph,et al.  The power of parallel prefix , 1985, IEEE Transactions on Computers.

[2]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[3]  Srinivas Aluru,et al.  Parallel biological sequence comparison using prefix computations , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[4]  Alexandru Nicolau,et al.  The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints , 1996, IEEE Trans. Computers.

[5]  Yen-Chun Lin,et al.  A New Class of Depth-Size Optimal Parallel Prefix Circuits , 2004, The Journal of Supercomputing.

[6]  Yen-Chun Lin,et al.  Faster optimal parallel prefix circuits: New algorithmic construction , 2005, J. Parallel Distributed Comput..

[7]  Çetin Kaya Koç,et al.  Parallel prefix computation with few processors , 1992 .

[8]  Ronald L. Graham,et al.  On the construction of zero-deficiency parallel prefix circuits with minimum depth , 2006, TODE.

[9]  Li-Ling Hung,et al.  New families of computation-efficient parallel prefix algorithms , 2009 .

[10]  Yen-Chun Lin,et al.  Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit , 2003, The Journal of Supercomputing.

[11]  D. S. SzyId,et al.  Parallel Computation: Models And Methods , 1998, IEEE Concurrency.

[12]  H. K. Dai,et al.  Reconfigurable hardware solution to parallel prefix computation , 2007, The Journal of Supercomputing.

[13]  Li-Ling Hung,et al.  Fast problem-size-independent parallel prefix circuits , 2009, J. Parallel Distributed Comput..

[14]  S. Lakshmivarahan,et al.  Parallel computing using the prefix problem , 1994 .

[15]  David A. Carlson,et al.  Limited width parallel prefix circuits , 1990, The Journal of Supercomputing.

[16]  Marc Snir,et al.  Depth-Size Trade-Offs for Parallel Prefix Computation , 1986, J. Algorithms.

[17]  Luigi Cinque,et al.  Parallel prefix computation on a pyramid computer , 1995, Pattern Recognit. Lett..

[18]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[19]  Jianhua Liu,et al.  An Algorithmic Approach for Generic Parallel Adders , 2003, ICCAD 2003.

[20]  Zhiwei Xu,et al.  Modeling communication overhead: MPI and MPL performance on the IBM SP2 , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[21]  Li-Ling Hung,et al.  Parallel prefix algorithms on the multicomputer , 2008 .

[22]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[23]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[24]  Frank Thomson Leighton Introduction to parallel algorithms and architectures: arrays , 1992 .

[25]  Afonso Ferreira,et al.  Parallel complexity of the medial axis computation , 1995, Proceedings., International Conference on Image Processing.

[26]  David W. Krumme,et al.  Gossiping in Minimal Time , 1992, SIAM J. Comput..

[27]  Feng Zhou,et al.  Computing moments by prefix sums , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[28]  Yen-Chun Lin,et al.  Finding Optimal Parallel Prefix Circuits with Fan-Out 2 in Constant Time , 1999, Inf. Process. Lett..

[29]  Rajit Manohar,et al.  Asynchronous Parallel Prefix Computation , 1998, IEEE Trans. Computers.

[30]  Chung-Kuan Cheng,et al.  Constructing Zero-deficiency Parallel Prefix Circuits of Minimum Depth , 2005 .

[31]  Reto Zimmermann,et al.  Binary adder architectures for cell-based VLSI and their synthesis , 1997 .

[32]  Yen-Chun Lin,et al.  Optimal Parallel Prefix on the Postal Model , 2003, J. Inf. Sci. Eng..

[33]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[34]  Allan L. Fisher,et al.  Parallelizing complex scans and reductions , 1994, PLDI '94.

[35]  Yen-Chun Lin,et al.  A new approach to constructing optimal parallel prefix circuits with small depth , 2004, J. Parallel Distributed Comput..

[36]  Boris Goldengorin,et al.  Proceedings of 9th WSEAS international conference on applied informatics and communications (AIC'09) , 2009 .

[37]  Joseph JáJá,et al.  Prefix computations on symmetric multiprocessors , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[38]  Larry Rudolph,et al.  Parallel Prefix on Fully Connected Direct Connection Machines , 1986, ICPP.

[39]  Eunice E. Santos,et al.  Optimal and Efficient Algorithms for Summing and Prefix Summing on Parallel Machines , 2002, J. Parallel Distributed Comput..

[40]  Mary Sheeran,et al.  A new approach to the design of optimal parallel prefix circuits , 2006 .

[41]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[42]  Amitava Datta Multiple Addition and Prefix Sum on a Linear Array with a Reconfigurable Pipelined Bus System , 2004, The Journal of Supercomputing.

[43]  Tack-Don Han,et al.  Fast area-efficient VLSI adders , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[44]  Haridimos T. Vergos,et al.  Fast parallel-prefix modulo 2/sup n/+1 adders , 2004, IEEE Transactions on Computers.

[45]  Albert Y. Zomaya,et al.  Scalable Hardware-Algorithms for Binary Prefix Sums , 2000, IEEE Trans. Parallel Distributed Syst..

[46]  Yen-Chun Lin Optimal Parallel Prefix Circuits with Fan-Ot 2 and Corresponding Parallel Algorithms , 1999, Neural Parallel Sci. Comput..

[47]  Yen-Chun Lin,et al.  Efficient Parallel Prefix Algorithms on Multicomputers , 2000, J. Inf. Sci. Eng..

[48]  Eisenstat C. Eisenstat O(log*n) algorithms on a Sum-CRCW PRAM , 2006, Computing.

[49]  Li-Ling Hung,et al.  Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2 , 2009, TODE.

[50]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[51]  Marc Snir,et al.  The Communication Software and Parallel Environment of the IBM SP2 , 1995, IBM Syst. J..

[52]  Yen-Chun Lin,et al.  Efficient Parallel Prefix Algorithms on Multiport Message-Passing Systems , 1999, Inf. Process. Lett..

[53]  Yen-Chun Lin,et al.  Z4: A New Depth-Size Optimal Parallel Prefix Circuit With Small Depth , 2003, Neural Parallel Sci. Comput..

[54]  Daniel Gajski,et al.  A Heuristic for Suffix Solutions , 1986, IEEE Transactions on Computers.

[55]  S. Lakshmivarahan,et al.  On a New Class of Optimal Parallel Prefix Circuits with (Size+Depth) = 2n-2 AND\lceil log n \rceil DEPTH(2 \lceil log n \rceil -3) , 1987, ICPP.