Modified Fused Multiply and Add for Exact Low Precision Product Accumulation

The implementation of the Fused Multiply and Add (FMA) operation has been extensively studied in the literature on standard and large precisions. We suggest re- visiting those studies for 16-bit precision. We introduce a variation of the Mixed precision FMA targeted for applications processing low precision inputs (such as machine learning). We also introduce several versions of a fixed point based floating- point FMA which performs an exact accumulation of binary16 numbers. We study the implementation and area footprint of those operators in comparison with standard FMAs.

[1]  Earl E. Swartzlander,et al.  A floating-point fused dot-product unit , 2008, 2008 IEEE International Conference on Computer Design.

[2]  Christoph Quirin Lauter,et al.  Metalibm: A Mathematical Functions Code Generator , 2014, ICMS.

[3]  Florent de Dinechin,et al.  A mixed-precision fused multiply and add , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[4]  Peter W. Cook,et al.  Second-generation RISC floating point with multiply-add fused , 1990 .

[5]  David R. Lutz Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[6]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[7]  Christopher A. Krygowski,et al.  The IBM eServer z990 floating-point unit , 2004, IBM J. Res. Dev..

[8]  E.E. Swartzlander,et al.  Floating-Point Fused Multiply-Add Architectures , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[9]  P.-M. Seidel Multiple path IEEE floating-point fused multiply-add , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[10]  Florent de Dinechin,et al.  Code Generators for Mathematical Functions , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[11]  Javier D. Bruguera,et al.  Floating-point multiply-add-fused with reduced latency , 2004, IEEE Transactions on Computers.

[12]  Eric M. Schwarz,et al.  FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.