Efficient implementation of 3X for radix-8 encoding

Several commercial processors have selected the radix-8 multiplier architecture to increase their speed, thereby reducing the number of partial products. Radix-8 encoding reduces the digit number length in a signed digit representation. Its performance bottleneck is the generation of the term 3X, also referred to as hard multiple. This term is usually computed by an adding and shifting operation, 3X=2X+X, in a high-speed adder. In a 2X+X addition, close full adders share the same input signal. This property permits simplified algebraic expressions associated to a 3X operation other than in a conventional addition. This paper shows that the 3X operation can be expressed in terms of two signals, H"i and K"i, functionally equivalent to two carries. H"i and K"i are computed in parallel using architectures which lead to an area- and speed-efficient implementation. For the purposes of comparison, implementation based on standard cells of conventional adders has been compared with the proposed circuits based on these H"i and K"i signals. As a result, the delay of the proposed serial scheme is reduced by roughly 67% without additional cost in area, the delay and area of the carry look-ahead scheme is reduced by 20% and 17%, and that of the parallel prefix scheme is reduced by 26% and 46%, respectively.

[1]  Michael J. Flynn,et al.  Advanced Computer Arithmetic Design , 2001 .

[2]  Khurram Muhammad,et al.  Speed, power, area, and latency tradeoffs in adaptive FIR filtering for PRML read channels , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[3]  Ching-Te Chuang,et al.  A 400 MHz S/390 microprocessor , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[4]  F. Weber,et al.  An out-of-order three-way superscalar multimedia floating-point unit , 1999, 1999 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC. First Edition (Cat. No.99CH36278).

[5]  Thomas J. McPherson,et al.  CMOS floating-point unit for the S/390 Parallel Enterprise Server G4 , 1997, IBM J. Res. Dev..

[6]  Burton M. Leary,et al.  A 200 MHz 64 b dual-issue CMOS microprocessor , 1992, 1992 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[7]  E. Friedman,et al.  Transactions Briefs A Hybrid Radix-4 / Radix-8 Low Power Signed Multiplier Architecture , 1997 .

[8]  Hussein Baher,et al.  Analog & digital signal processing , 1990 .

[9]  R. Allmon,et al.  A 600MHz superscalar floating point processor , 1998, Proceedings of the 24th European Solid-State Circuits Conference.

[10]  Robert M. Averill,et al.  A radix-8 CMOS S/390 multiplier , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[11]  Peter-Michael Seidel,et al.  Pipelined multiplicative division with IEEE rounding , 2003, Proceedings 21st International Conference on Computer Design.

[12]  S. Fischer,et al.  A 600 MHz IA-32 microprocessor with enhanced data streaming for graphics and video , 1999 .

[13]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[14]  W. J. Bowhill,et al.  A pipelined 50-MHz CMOS 64-bit floating-point arithmetic processor , 1989 .

[15]  Shao Zhibiao,et al.  The research on optimization techniques of 32-bit floating-point RISC microprocessor , 2005, Proceedings of 2005 IEEE International Workshop on VLSI Design and Video Technology, 2005..

[16]  R. G. Deshmukh,et al.  A 54/spl times/54 bit multiplier with a new Redundant Binary Booth's encoding , 2002, IEEE CCECE2002. Canadian Conference on Electrical and Computer Engineering. Conference Proceedings (Cat. No.02CH37373).