A 6.2-GFlops Floating-Point Multiply-Accumulator With Conditional Normalization

A pipelined single-precision floating-point multiply-accumulator (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic with delayed addition is described. A combination of algorithmic, logic, and circuit techniques enables multiply-accumulate operations at speeds exceeding 3 GHz with single-cycle throughput. The optimizations allow removal of the costly normalization step from the critical accumulate loop. This logic is conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading-zero anticipator (LZA) and overflow prediction logic applicable to carry-save format is presented. In a 90-nm seven-metal dual-VT CMOS process, the 2 mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFlops of performance while dissipating 1.2 W at 3.1 GHz, 1.3-V supply

[1]  A. Murthy,et al.  A 90 nm communication technology featuring SiGe HBT transistors, RF CMOS, precision R-L-C RF elements and 1 /spl mu/m2 6-T SRAM cell , 2002, Digest. International Electron Devices Meeting,.

[2]  K.J. Nowka,et al.  1 GHz leading zero anticipator using independent sign-bit determination logic , 2000, 2000 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.00CH37103).

[3]  Tatsumi Yamauchi,et al.  A 13.3ns double-precision floating-point ALU and multiplier , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[4]  Kevin J. Nowka,et al.  Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[5]  S.H. Dhong,et al.  A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor , 2006, IEEE Journal of Solid-State Circuits.

[6]  F. Klass Semi-dynamic and dynamic flip-flops with embedded logic , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[7]  Koichiro Mashiko,et al.  Leading-zero anticipatory logic for high-speed floating point addition , 1995 .

[8]  Javier D. Bruguera,et al.  Leading-One Prediction with Concurrent Position Correction , 1999, IEEE Trans. Computers.

[9]  Peter-Michael Seidel,et al.  On the design of fast IEEE floating-point adders , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[10]  R. Krishnamurthy,et al.  A 4 GHz 130 nm address generation unit with 32-bit sparse-tree adder core , 2002, 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302).

[11]  Margaret Martonosi,et al.  Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques , 2000, IEEE Trans. Computers.

[12]  S. Vangal,et al.  A 5 GHz floating point multiply-accumulator in 90 nm dual V/sub T/ CMOS , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[13]  Michael J. Flynn,et al.  The SNAP project: design of floating point arithmetic units , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[14]  F. Elguibaly,et al.  A fast parallel multiplier-accumulator using the modified Booth algorithm , 2000 .

[15]  Peter W. Cook,et al.  Second-generation RISC floating point with multiply-add fused , 1990 .

[16]  Cheng-Chew Lim,et al.  Reduced latency IEEE floating-point standard adder architectures , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[17]  S. Borkar,et al.  Dynamic-sleep transistor and body bias for active leakage power control of microprocessors , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..