Design and Clocking of VLSI Multipliers

This thesis presents a versatile new multiplier architecture, which can provide better performance than conventional linear arry multipliers at a fraction of the silicon area. The high performance is obtained by using a new binary tree structure, the 4-2 tree. The 4-2 tree is symmetric and far more regular than other multiplier trees while offering comparable performance, making it better suited for VLSI implementations. To reduce area, a partial, pipelined 4-2 tree is used with a 4-2 carry-save accumulator placed at its outputs to iteratively sum the partial products as they are generated. Maximum performance is obtained by accurately matching the iterative clock to the pipeline rate of the 4-2 tree, using a stoppable on-chip clock generator. To prove the new architecture a test chip, called SPIM, was fabricated in a 1.6 (Mu)m CMOS process. SPIM contains 41,000 transistors with an array size of 2.9 X 5.3 mm. Running at an internal clock frequency of 85 MHz, SPIM performs the 64 bit mantissa portion of a double extended precision floating-point multiply in under 120 ns. To make the new architecture commercially interesting, several high-performance rounding algorithms compatible with IEEE standard 754 for binary floating-point arithmetic have also been developed.

[1]  David L. Dill,et al.  Trace theory for automatic hierarchical verification of speed-independent circuits , 1989, ACM distinguished dissertations.

[2]  Christer Svensson,et al.  High-speed CMOS circuit technique , 1989 .

[3]  David G. Messerschmitt,et al.  Asynchronous processor design for digital signal processing , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  N. Takagi,et al.  A high-speed multiplier using a redundant binary adder tree , 1987 .

[5]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[6]  Michitaka Kameyama,et al.  A 32 × 32 BIT multiplier using multiple-valued MOS current-mode circuits , 1987, 1987 Symposium on VLSI Circuits.

[7]  J. B. Gosling Some tricks of the (floating point) trade , 1983, 1983 IEEE 6th Symposium on Computer Arithmetic (ARITH).

[8]  David Cooke Noice A clocking discipline for two-phase digital integrated circuits , 1983 .

[9]  F.A. Ware,et al.  64 bit monolithic floating point processors , 1982, IEEE Journal of Solid-State Circuits.

[10]  Kari Johnsen An IEEE floating point arithmetic implementation , 1983, 1983 IEEE 6th Symposium on Computer Arithmetic (ARITH).

[11]  W. J. Bowhill,et al.  A 50 MHz uniformly pipelined 64 b floating-point arithmetic processor , 1989, IEEE International Solid-State Circuits Conference, 1989 ISSCC. Digest of Technical Papers.

[12]  D. Schmitt-Landsiedel,et al.  A Pipelined 330 MHz Multiplier , 1985, ESSCIRC '85: 11th European Solid-State Circuits Conference.

[13]  D. Zuras,et al.  Balanced delay trees and combinatorial division in VLSI , 1986 .

[14]  Sridhar Samudrala,et al.  On the implementation of shifters, multipliers, and dividers in VLSI floating point units , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[15]  Jack Sklansky,et al.  Conditional-Sum Addition Logic , 1960, IRE Trans. Electron. Comput..

[16]  Stephen H. Unger,et al.  Asynchronous sequential switching circuits , 1969 .

[17]  W. McAllister,et al.  An NMOS 64b floating-point chip set , 1986, 1986 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[18]  J. Michael Yohe,et al.  Roundings in Floating-Point Arithmetic , 1973, IEEE Transactions on Computers.

[19]  M. Horowitz,et al.  A Pipelined 64x64b Iterative Array Multiplier , 1988, 1988 IEEE International Solid-State Circuits Conference, 1988 ISSCC. Digest of Technical Papers.

[20]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[21]  J. Greene,et al.  A CMOS 32b Wallace tree multiplier-accumulator , 1986, 1986 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[22]  S. F. Anderson,et al.  The IBM system/360 model 91: floating-point execution unit , 1967 .

[23]  Jerome T. Coonen,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[24]  David G. Messerschmitt,et al.  Design of clock-free asynchronous systems for real-time signal processing , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[25]  Mark Horowitz,et al.  SPIM: a pipelined 64*64-bit iterative multiplier , 1989 .

[26]  Jerome T. Coonen,et al.  Special Feature an Implementation Guide to a Proposed Standard for Floating-Point Arithmetic , 1980, Computer.

[27]  N. F. Goncalves,et al.  NORA: a racefree dynamic CMOS technique for pipelined logic structures , 1983 .

[28]  Chenming Hu,et al.  Electromigration interconnect lifetime under AC and pulse DC stress , 1989 .

[29]  Mark Horowitz,et al.  Rounding algorithms for IEEE multipliers , 1989, Proceedings of 9th Symposium on Computer Arithmetic.

[30]  J. M. Yohe,et al.  Roundings in floating point arithmetic , 1972, IEEE Symposium on Computer Arithmetic.

[31]  Lynn Conway,et al.  Introduction to VLSI systems , 1978 .

[32]  Peng H. Ang,et al.  A 30-mflop 32b Cmos Floating-Point Processor , 1988, 1988 IEEE International Solid-State Circuits Conference, 1988 ISSCC. Digest of Technical Papers.

[33]  Joseph J. F. Cavanagh Digital Computer Arithmetic: Design And Implementation , 1984 .

[34]  Kai Hwang,et al.  Computer arithmetic: Principles, architecture, and design , 1979 .

[35]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[36]  M.T. Fertsch,et al.  A 16 bitx16 bit pipelined multiplier macrocell , 1985, IEEE Journal of Solid-State Circuits.