Scalable Hardware-Algorithms for Binary Prefix Sums

We address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a w/sup k/-bit, (k/spl ges/2), sequence using as basic building blocks linear arrays of at most w/sup 2/ shift switches, where w is a small power of 2. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most w/sup 2/. We adopt a VLSI delay model where the "length" of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a w/sup k/-bit binary sequence in the time of 2k-2 broadcasts, while the corresponding prefix sums can be computed in the time of 3k-4 broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most w/sup 2/, the total number of broadcasts involved is less than three times the number required by an "ideal" design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a kw/sup 2/-bit binary sequence in the time of 3k+[log/sub w/ k]=3 broadcasts. Using this design, the corresponding prefix sums can be computed in the time of 4k+[log/sub w/ k]-5 broadcasts.

[1]  Uming Ko,et al.  Low-power design techniques for high-performance CMOS adders , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Vincenzo Piuri,et al.  Pipelined Adders , 1996, IEEE Trans. Computers.

[3]  Stephan Olariu,et al.  Reconfigurable Buses with Shift Switching: Concepts and Applications , 1995, IEEE Trans. Parallel Distributed Syst..

[4]  Koji Nakano An Efficient Algorithm for Summing up Binary Values on a Reconfigurable Mesh (Special Section on Discrete Mathematics and Its Applications) , 1994 .

[5]  Stephan Olariu,et al.  Data Movement Techniques on Reconfigurable Meshes, with Applications , 1994, Int. J. High Speed Comput..

[6]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[7]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[8]  Koji Nakano,et al.  A Bibliography of Published Papers on Dynamically Reconfigurable Architectures , 1995, Parallel Process. Lett..

[9]  Joseph Cavanagh,et al.  Digital Computer Arithmetic , 1983 .

[10]  Stephan Olariu,et al.  An Efficient Algorithm for Row Minima Computations on Basic Reconfigurable Meshes , 2022 .

[11]  Earl E. Swartzlander Parallel Counters , 1973, IEEE Transactions on Computers.

[12]  S. Lakshmivarahan,et al.  Parallel computing using the prefix problem , 1994 .

[13]  K. Tsukada,et al.  Data communications , 1981, IEEE Communications Magazine.

[14]  Joseph J. F. Cavanagh Digital Computer Arithmetic: Design And Implementation , 1984 .

[15]  Fred Halsall,et al.  Data communications, computer networks and open systems (3. ed.) , 1995, Electronic-systems engineering series.

[16]  Neil Weste,et al.  Principles of CMOS VLSI Design , 1985 .

[17]  Jeffrey D Ullma Computational Aspects of VLSI , 1984 .

[18]  J. Zhang,et al.  Fundamental data movement algorithms for reconfigurable meshes , 1992, Eleventh Annual International Phoenix Conference on Computers and Communication [1992 Conference Proceedings].

[19]  Fred Halsall,et al.  Data communications, computer networks and open systems (3. ed.) , 1995, Electronic-systems engineering series.

[20]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[21]  S. Olariu,et al.  Reconfigurable buses with shift switching-architectures and applications , 1993, Proceedings of Phoenix Conference on Computers and Communications.

[22]  Toshihiro Sugii,et al.  Dynamic threshold pass-transistor logic for improved delay at lower power supply voltages , 1999, IEEE J. Solid State Circuits.

[23]  P. Borsook Data communications , 1994, IEEE Spectrum.

[24]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[25]  Ming-Bo Lin,et al.  The Design of an Optoelectronic Arithmetic Processor Based on Permutation Networks , 1997, IEEE Trans. Computers.

[26]  Massimo Maresca,et al.  Polymorphic-Torus Network , 1989, IEEE Trans. Computers.

[27]  Dionysios I. Reisis,et al.  Parallel Computations on Reconfigurable Meshes , 1993, IEEE Trans. Computers.

[28]  Adnan Aziz,et al.  Performance driven synthesis for pass-transistor logic , 1999, Proceedings Twelfth International Conference on VLSI Design. (Cat. No.PR00013).

[29]  Trevor York,et al.  Book Review: Principles of CMOS VLSI Design: A Systems Perspective , 1986 .

[30]  D. S. SzyId,et al.  Parallel Computation: Models And Methods , 1998, IEEE Concurrency.

[31]  Massimo Maresca,et al.  Polymorphic Processor Arrays , 1993, IEEE Trans. Parallel Distributed Syst..

[32]  Makoto Suzuki,et al.  A 1.5-ns 32-b CMOS ALU in double pass-transistor logic , 1993 .

[33]  Koji Nakano Prefix-Sums Algorithms on Reconfigurable Meshes , 1995, Parallel Process. Lett..

[34]  Wolfgang Fichtner,et al.  Low-power logic styles: CMOS versus pass-transistor logic , 1997, IEEE J. Solid State Circuits.