A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design

The floating-point multiply-add fused (MAF) unit sets a new trend in the processor design to speed up floatingpoint performance in scientific and multimedia applications. This paper proposes a new architecture for the MAF unit that supports multiple IEEE precisions multiply-add operation (AtimesB+C) with Single Instruction Multiple Data (SIMD) feature. The proposed MAF unit can perform either one double-precision or two parallel single-precision operations using about 18% more hardware than a conventional double-precision MAF unit and with 9% increase in delay. To accommodate the simultaneous computation of two single-precision MAF operations, several basic modules of double-precision MAF unit are redesigned. They are either segmented by precision mode dependent multiplexers or attached by the duplicated hardware. The proposed MAF unit can be fully pipelined and the experimental results show that it is suitable for processors with floatingpoint unit (FPU).

[1]  Sang H. Dhong,et al.  The vector floating-point unit in a synergistic processor element of a CELL processor , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[2]  Michael J. Liebelt,et al.  Multiple-precision fixed-point vector multiply-accumulator using shared segmentation , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[3]  Chichyang Chen,et al.  Architectural design of a fast floating-point multiplication-add fused unit using signed-digit addition , 2001, Proceedings Euromicro Symposium on Digital Systems Design.

[4]  John Harrison,et al.  Intel® Itanium® floating-point architecture , 2003, WCAE '03.

[5]  Romesh M. Jessani,et al.  Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units , 1998, IEEE Trans. Computers.

[6]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[7]  Javier D. Bruguera,et al.  Leading-One Prediction with Concurrent Position Correction , 1999, IEEE Trans. Computers.

[8]  J. Dongarra,et al.  Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems) , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[9]  Steven W. White,et al.  POWER3: The next generation of PowerPC processors , 2000, IBM J. Res. Dev..

[10]  Michael J. Schulte,et al.  A quadruple precision and dual double precision floating-point multiplier , 2003, Euromicro Symposium on Digital System Design, 2003. Proceedings..

[11]  T. Lang,et al.  Floating-point fused multiply-add with reduced latency , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[12]  Peter W. Cook,et al.  Second-generation RISC floating point with multiply-add fused , 1990 .

[13]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[14]  Peng Wu,et al.  Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L , 2005, IBM J. Res. Dev..

[15]  Michael J. Schulte,et al.  Flexible arithmetic and logic unit for multimedia processing , 2003, SPIE Optics + Photonics.

[16]  Michael J. Schulte,et al.  Multiplier architectures for media processing , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[17]  Kevin J. Nowka,et al.  Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[18]  S.M. Mueller,et al.  A dual mode IEEE multiplier , 1997, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon.