Fast Inner Product Computation on Short Buses

We propose a VLSI inner product processor architecture involving broadcasting only over short buses (containing less than 64 switches). The architecture leads to an efficient algorithm for the inner product computation. Specifically, it takes 13 broadcasts, each over less than 64 switches, plus 2 carry-save additions (tcsa) and 2 carry-lookahead additions (tcla) to compute the inner product of two arrays of N=29 elements, each consisting of m=64 bits. Using the same order of VLSI area, our algorithm runs faster than the best known fast inner product algorithm of Smith and Torng [“Design of a fast inner product processor,” Proceedings of IEEE 7th Symposium on Computer Arithmetic (1985)], which takes about 28 tcsa

[1]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[2]  Hwa C. Torng,et al.  Design of a fast inner product processor , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[3]  J. Zhang,et al.  Integer sorting in O(1) time on an n*n reconfigurable mesh , 1992, Eleventh Annual International Phoenix Conference on Computers and Communication [1992 Conference Proceedings].

[4]  Jerome Rothstein Bus automata, brains, and mental models , 1988, IEEE Trans. Syst. Man Cybern..

[5]  Stephan Olariu,et al.  Fast computer vision algorithms for reconfigurable meshes , 1992, Image Vis. Comput..

[6]  Earl E. Swartzlander,et al.  Computer Arithmetic , 1980 .

[7]  Massimo Maresca,et al.  Polymorphic-Torus Network , 1989, IEEE Trans. Computers.

[8]  Rong Lin Reconfigurable Buses with Shift Switching - VLSI RADIX Sort , 1992, ICPP.

[9]  Gen-Huey Chen,et al.  Constant Time Algorithms for the Transitive Closure and Some Related Graph Problems on Processor Arrays with Reconfigurable Bus Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[10]  Earl E. Swartzlander,et al.  Inner Product Computers , 1978, IEEE Transactions on Computers.

[11]  Stephan Olariu,et al.  Efficient VLSI architectures for Columnsort , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Jonathan Schaeffer,et al.  Systolic polynomial evaluation and matrix multiplication with multiple precision , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[13]  Stephan Olariu,et al.  Reconfigurable Buses with Shift Switching: Concepts and Applications , 1995, IEEE Trans. Parallel Distributed Syst..

[14]  Dionysios I. Reisis,et al.  Parallel Computations on Reconfigurable Meshes , 1993, IEEE Trans. Computers.

[15]  David B. Shu,et al.  The Gated Interconnection Network for Dynamic Programming , 1988 .

[16]  S. Olariu,et al.  An Eecient Vlsi Architecture for Columnsort , 1999 .

[17]  Albert Y. Zomaya,et al.  Scalable Hardware-Algorithms for Binary Prefix Sums , 1999, IPPS/SPDP Workshops.

[18]  Rong Lin,et al.  Fast Algorithms for Lowest Common Ancestors on a Processor Array with Reconfigurable Buses , 1991, Inf. Process. Lett..

[19]  Stephan Olariu,et al.  The Mesh with Hybrid Buses: An Efficient Parallel Architecture for Digital Geometry , 1999, IEEE Trans. Parallel Distributed Syst..