论文信息 - Floating-point fused multiply-add with reduced latency

Floating-point fused multiply-add with reduced latency

We propose an architecture for the computation of the floating-point multiply-add-fused (MAF) operation A+ (B /spl times/ C). This architecture is based on the combined addition and rounding (using a dual adder) and on the anticipation of the normalization step before the addition. Because the normalization is performed before the addition, it is not possible to overlap the leading-zero-anticipator with the adder. Consequently, to avoid the increase in delay we modify the design of the LZA so that the leading bits of its output are produced first and can be used to begin the normalization. Moreover, parts of the addition are also anticipated. We have estimated the delay of the resulting architecture for double-precision format, considering the load introduced by long connections, and estimate a reduction of about 15% to 20% with respect to traditional implementations of the floating-point MAF unit.

T. Lang | J.D. Bruguera

[1] Mark Horowitz,et al. Rounding algorithms for IEEE multipliers , 1989, Proceedings of 9th Symposium on Computer Arithmetic.

[2] Harsh Sharangpani,et al. Itanium Processor Microarchitecture , 2000, IEEE Micro.

[3] Peter-Michael Seidel,et al. A comparison of three rounding algorithms for IEEE floating-point multiplication , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[4] C.N. Hinds,et al. An enhanced floating point coprocessor for embedded signal processing and graphics applications , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[5] Gregory B. Zyner,et al. 167 MHz radix-4 floating point multiplier , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[6] Peter W. Cook,et al. Second-generation RISC floating point with multiply-add fused , 1990 .

[7] Hewlett-Packard. THE HP PA-8000 RISC CPU , 2022 .

[8] Steven W. White,et al. POWER3: The next generation of PowerPC processors , 2000, IBM J. Res. Dev..

[9] Simon Knowles,et al. A family of adders , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[10] Chichyang Chen,et al. Architectural design of a fast floating-point multiplication-add fused unit using signed-digit addition , 2001, Proceedings Euromicro Symposium on Digital Systems Design.

[11] Romesh M. Jessani,et al. Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units , 1998, IEEE Trans. Computers.

[12] Michael J. Flynn,et al. The SNAP project: design of floating point arithmetic units , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.