论文信息 - A mixed-precision fused multiply and add

A mixed-precision fused multiply and add

The floating-point fused multiply and add, computing R=AB+C with a single rounding, is now an IEEE-754 standard operator. This article investigates variants in which the addend C and the result R are of a larger format, for instance binary64 (double precision), while the multiplier inputs A and B are of a smaller format, for instance binary32 (single precision). Like the standard FMA operator, the proposed mixed-precision operator computes AB+C with a single rounding, and fully support subnormals. With minor modifications, it is also able to perform the standard FMA in the smaller format, and the standard addition in the larger format.

[1] Javier D. Bruguera,et al. Floating-point fused multiply-add: reduced latency for floating-point addition , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[2] P.-M. Seidel. Multiple path IEEE floating-point fused multiply-add , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[3] William J. Dally,et al. Register organization for media processing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[4] Ulrich W. Kulisch,et al. Advanced Arithmetic for the Digital Computer, Design of Arithmetic Units , 2002, RealComp.

[5] William R. Dieter,et al. Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy , 2007, IEEE Computer Architecture Letters.

[6] David R. Lutz. Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[7] Siegfried M. Rump,et al. Accurate Floating-Point Summation Part I: Faithful Rounding , 2008, SIAM J. Sci. Comput..

[8] Vincent Lefèvre,et al. MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[9] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .

[10] A. Neumaier. Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen , 1974 .

[11] M. Pichat,et al. Correction d'une somme en arithmetique a virgule flottante , 1972 .

[12] Li Shen,et al. A New Architecture For Multiple-Precision Floating-Point Multiply-Add Fused Unit Design , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[13] Florent de Dinechin,et al. Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[14] T. Lang,et al. Floating-point fused multiply-add with reduced latency , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[15] Peter Deuflhard,et al. Numerische Mathematik. I , 2002 .

[16] Michael J. Flynn,et al. Reducing the Mean Latency of Floating-Point Addition , 1998, Theor. Comput. Sci..

[17] Silvia M. Müller,et al. The POWER7 Binary Floating-Point Unit , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[18] Neil Burgess,et al. Overcoming double-rounding errors under IEEE 754-2008 using software , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[19] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[20] Douglas M. Priest,et al. Algorithms for arbitrary precision floating point arithmetic , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[21] Ivo Babuska. Numerical stability in mathematical analysis , 1968, IFIP Congress.

[22] E.E. Swartzlander,et al. Floating-Point Fused Multiply-Add Architectures , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.