Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units

Low power, low cost, and high performance factors dictate the design of many microprocessors targeted to the low power computing market. The floating point unit occupies a significant percentage of the silicon area in a microprocessor due its wide data bandwidth (for double precision computations) and the area occupied by the multiply array. For microprocessors designed for portable products, the design site of the floating point unit plays an important role in the low cost factor driven by reduced chip area. Some microprocessors have multiply-add fused floating point units with a reduced multiply array, requiring two passes through the array for operations involving double precision multiplies. The paper discusses the design complexities around the dual pass multiply array and its effect on area and performance. Floating point unit areas and their associated multiply array areas are compared for a single and dual pass implementation in a given technology (PowerPC 604eTM and PowerPC 603eTM microprocessors, respectively).

[1]  Romesh M. Jessani,et al.  The floating-point unit of the PowerPC 603e microprocessor , 1996, IBM J. Res. Dev..

[2]  Stamatis Vassiliadis,et al.  A General Proof for Overlapped Multiple-Bit Scanning Multiplications , 1989, IEEE Trans. Computers.

[3]  R. Ravi,et al.  Design strategies for optimal multiplier circuits , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[4]  William J. Kubitz,et al.  A Compact High-Speed Parallel Multiplication Scheme , 1977, IEEE Transactions on Computers.

[5]  Robert O. Winder,et al.  Majority Gate Networks , 1964, IEEE Trans. Electron. Comput..

[6]  S. Vassiliadis,et al.  S/370 sign-magnitude floating-point adder , 1989 .

[7]  P.F. Stelling,et al.  Design strategies for the final adder in a parallel multiplier , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[8]  Vojin G. Oklobdzija,et al.  A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach , 1996, IEEE Trans. Computers.

[9]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[10]  David W. Matula,et al.  A 17 /spl times/ 69 bit multiply and add unit with redundant binary feedback and single cycle latency , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[11]  Vojin G. Oklobdzija,et al.  Design strategies for optimal hybrid final adders in a parallel multiplier , 1996, J. VLSI Signal Process..

[12]  Luigi Dadda Composite Parallel Counters , 1980, IEEE Transactions on Computers.

[13]  Peter W. Cook,et al.  Second-generation RISC floating point with multiply-add fused , 1990 .

[14]  Vojin G. Oklobdzija,et al.  Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[15]  Stamatis Vassiliadis,et al.  Hard-Wired Multipliers with Encoded Partial Products , 1991, IEEE Trans. Computers.

[16]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[17]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..