Software Implementation of Floating-Point Arithmetic

The previous chapter has presented the basic paradigms used for implementing floating-point arithmetic in hardware. However, some processors may not have such dedicated hardware, mainly for cost reasons. When it is necessary to handle floating-point numbers on such processors, one solution is to implement floating-point arithmetic in software.

[1]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[2]  Akhilesh Tyagi,et al.  A Reduced-Area Scheme for Carry-Select Adders , 1993, IEEE Trans. Computers.

[3]  Christopher C. Doss,et al.  FPGA-based implementation of a robust IEEE-754 exponential unit , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[4]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Yamin Li,et al.  Implementation of single precision floating point square root on FPGAs , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[6]  M.A. Erle,et al.  Potential speedup using decimal floating-point hardware , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[7]  David R. Lutz Fused Multiply-Add Microarchitecture Comprising Separate Early-Normalizing Multiply and Add Pipelines , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[8]  Javier D. Bruguera,et al.  Leading-One Prediction with Concurrent Position Correction , 1999, IEEE Trans. Computers.

[9]  Naofumi Takagi,et al.  A VLSI Algorithm for Computing the Euclidean Norm of a 3D Vector , 2000, IEEE Trans. Computers.

[10]  Florent de Dinechin,et al.  Large multipliers with fewer DSP blocks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[11]  Harold S. Stone,et al.  A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations , 1973, IEEE Transactions on Computers.

[12]  Guillaume Revy,et al.  Implementation of binary floating-point arithmetic on embedded integer processors - Polynomial evaluation-based algorithms and certified code generation , 2009 .

[13]  Jean-Michel Muller Complex division with prescaling of operands , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[14]  Manfred Glesner,et al.  High-performance fpga-based floating-point adder with three inputs , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[15]  Álvaro Vázquez Álvarez High-performance decimal floating point units , 2009 .

[16]  Mi Lu,et al.  Group-Alignment based Accurate Floating-Point Summation on FPGAs , 2006, ERSA.

[17]  Paolo Montuschi,et al.  A New Family of High.Performance Parallel Decimal Multipliers , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[18]  Markus Püschel,et al.  Multiplierless multiple constant multiplication , 2007, TALG.

[19]  Nachiket Kapre,et al.  Optimistic Parallelization of Floating-Point Accumulation , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[20]  Eric M. Schwarz,et al.  IBM POWER6 accelerators: VMX and DFU , 2007, IBM J. Res. Dev..

[21]  John Harrison,et al.  A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[22]  Christoph Berg,et al.  FORMAL VERIFICATION OF AN IEEE FLOATING POINT ADDER , 2001 .

[23]  Magdy Bayoumi,et al.  A novel high-performance CMOS 1-bit full-adder cell , 2000 .

[24]  Claude-Pierre Jeannerod,et al.  Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[25]  Jean-Michel Muller,et al.  Accelerating correctly rounded floating-point division when the divisor is known in advance , 2004, IEEE Transactions on Computers.

[26]  Kevin J. Nowka,et al.  Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[27]  William R. Dieter,et al.  Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy , 2007, IEEE Computer Architecture Letters.

[28]  Xiao Yan Yu,et al.  A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor , 2006, 2006 Proceedings of the 32nd European Solid-State Circuits Conference.

[29]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[30]  Florent de Dinechin,et al.  An FPGA-specific approach to floating-point accumulation and sum-of-products , 2008, 2008 International Conference on Field-Programmable Technology.

[31]  E. Abu-Shama,et al.  A new cell for low power adders , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[32]  Samuel Moore Intel Makes A Big Jump In Computer Math , 2008, IEEE Spectrum.

[33]  Arnaud Tisserand,et al.  A floating-point library for integer processors , 2004, SPIE Optics + Photonics.

[34]  George A. Constantinides,et al.  Correctly rounded constant integer division via multiply-add , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[35]  Bogdan Pasca,et al.  FPGA-Specific Arithmetic Optimizations of Short-Latency Adders , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[36]  Jean-Michel Muller,et al.  Integer and floating-point constant multipliers for FPGAs , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[37]  Christopher A. Krygowski,et al.  The IBM eServer z990 floating-point unit , 2004, IBM J. Res. Dev..

[38]  Florent de Dinechin,et al.  A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic , 2007, J. VLSI Signal Process..

[39]  C. R. Cole,et al.  CMOS/SOS frequency synthesizer LSI circuit for spread spectrum communications , 1984 .

[40]  Scott McMillan,et al.  A re-evaluation of the practicality of floating-point operations on FPGAs , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[41]  Behrooz Parhami,et al.  On the Complexity of Table Lookup for Iterative Division , 1987, IEEE Transactions on Computers.

[42]  Jean-Michel Muller A Few Results on Table-Based Methods , 1998, SCAN.

[43]  Yingtao Jiang,et al.  Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates , 2002 .

[44]  Florent de Dinechin,et al.  Return of the hardware floating-point elementary function , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[45]  Malte Baesler,et al.  FPGA Implementation of a Decimal Floating-Point Accurate Scalar Product Unit with a Parallel Fixed-Point Multiplier , 2009, 2009 International Conference on Reconfigurable Computing and FPGAs.

[46]  Peter M. Athanas,et al.  Quantitative analysis of floating point arithmetic on FPGA based custom computing machines , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[47]  Jean-Michel Muller,et al.  Choosing starting values for certain Newton-Raphson iterations , 2006, Theor. Comput. Sci..

[48]  Peter R. Cappello,et al.  A VLSI layout for a pipelined Dadda multiplier , 1983, TOCS.

[49]  ANTONIN SVOBODA Adder With Distributed Control , 1970, IEEE Transactions on Computers.

[50]  A. Knofel,et al.  Fast hardware units for the computation of accurate dot products , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[51]  Miriam Leeser,et al.  Advanced Components in the Variable Precision Floating-Point Library , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[52]  A. Nannarelli,et al.  A Radix-10 Combinational Multiplier , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[53]  David M. Russinoff A Case Study in Fomal Verification of Register-Transfer Logic with ACL2: The Floating Point Adder of the AMD AthlonTM Processor , 2000, FMCAD.

[54]  Earl E. Swartzlander,et al.  A comparison of Dadda and Wallace multiplier delays , 2003, SPIE Optics + Photonics.

[55]  Brent E. Nelson,et al.  Novel Optimizations for Hardware Floating-Point Units in a Modern FPGA Architecture , 2002, FPL.

[56]  Nicolas Boullis,et al.  Some optimizations of hardware multiplication by constant matrices , 2005, IEEE Transactions on Computers.

[57]  Norman P. Jouppi,et al.  The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.

[58]  Christophe Mouilleron,et al.  Automatic Generation of Fast and Certified Code for Polynomial Evaluation , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[59]  Debjit Das Sarma,et al.  Faithful bipartite ROM reciprocal tables , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[60]  Jianhua Liu,et al.  An iterative division algorithm for FPGAs , 2006, FPGA '06.

[61]  Florent de Dinechin Multiplication by Rational Constants , 2012, IEEE Trans. Circuits Syst. II Express Briefs.

[62]  Florent de Dinechin,et al.  Constant Multipliers for FPGAs , 2000, PDPTA.

[63]  Michael J. Schulte,et al.  Decimal floating-point division using Newton-Raphson iteration , 2004 .

[64]  Florent de Dinechin,et al.  Multiplicative Square Root Algorithms for FPGAs , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[65]  George A. Constantinides,et al.  Accurate Floating Point Arithmetic through Hardware Error-Free Transformations , 2011, ARC.

[66]  Debjit Das Sarma,et al.  Measuring the Accuracy of ROM Reciprocal Tables , 1994, IEEE Trans. Computers.

[67]  Florent de Dinechin,et al.  Parameterized floating-point logarithm and exponential functions for FPGAs , 2007, Microprocess. Microsystems.

[68]  Marisa Lopez-Vallejo,et al.  An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithmetic ⁄ , 2006 .

[69]  K. D. Tocher TECHNIQUES OF MULTIPLICATION AND DIVISION FOR AUTOMATIC BINARY COMPUTERS , 1958 .

[70]  Michael J. Schulte,et al.  Decimal Floating-Point Multiplication Via Carry-Save Addition , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[71]  Mark Horowitz,et al.  Robust Energy-Efficient Adder Topologies , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[72]  Jean-Michel Muller,et al.  Fast and correctly rounded logarithms in double-precision , 2007, RAIRO Theor. Informatics Appl..

[73]  Peter W. Cook,et al.  Second-generation RISC floating point with multiply-add fused , 1990 .

[74]  Wolfgang Rülling,et al.  Exact accumulation of floating-point numbers , 1991, IEEE Symposium on Computer Arithmetic.

[75]  Viktor K. Prasanna,et al.  Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[76]  Augustin-Louis Cauchy Oeuvres complètes: CALCULS NUMÉRIQUES. – Sur les moyens d'éviter les erreurs dans les calculs numériques , 2009 .

[77]  Vojin G. Oklobdzija,et al.  Evaluation of Booth encoding techniques for parallel multiplier implementation , 1993 .

[78]  Florent de Dinechin,et al.  Floating-Point Trigonometric Functions for FPGAs , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[79]  Eric M. Schwarz,et al.  A decimal floating-point specification , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[80]  Ulrich W. Kulisch,et al.  Advanced Arithmetic for the Digital Computer, Design of Arithmetic Units , 2002, RealComp.

[81]  O. Cref,et al.  FPGA-Based Acceleration of the Computations Involved in Transcranial Magnetic Stimulation , 2008, 2008 4th Southern Conference on Programmable Logic.

[82]  Ulrich W. Kulisch,et al.  Computer Arithmetic and Validity - Theory, Implementation, and Applications , 2008, de Gruyter studies in mathematics.

[83]  Peter-Michael Seidel,et al.  A Comparison of Three Rounding Algorithms for IEEE Floating-Point Multiplication , 2000, IEEE Trans. Computers.

[84]  Shanzhen Xing,et al.  FPGA Adders: Performance Evaluation and Optimal Design , 1998, IEEE Des. Test Comput..

[85]  Mário P. Véstias,et al.  Decimal multiplier on FPGA using embedded binary multipliers , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[86]  Arnaud Tisserand,et al.  Multipartite table methods , 2005, IEEE Transactions on Computers.

[87]  Michael J. Schulte,et al.  The Symmetric Table Addition Method for Accurate Function Approximation , 1999, J. VLSI Signal Process..

[88]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[89]  Claude-Pierre Jeannerod,et al.  How to Square Floats Accurately and Efficiently on the ST231 Integer Processor , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[90]  Ramesh C. Agarwal,et al.  Series approximation methods for divide and square root in the Power3/sup TM/ processor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[91]  Javier D. Bruguera,et al.  High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root , 2002, IEEE Trans. Computers.

[92]  O. Gustafsson,et al.  A detailed complexity model for multiple constant multiplication and an algorithm to minimize the complexity , 2005, Proceedings of the 2005 European Conference on Circuit Theory and Design, 2005..

[93]  Michael J. Flynn,et al.  The SNAP project: design of floating point arithmetic units , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[94]  Algirdas Avizienis,et al.  Signed-Digit Numbe Representations for Fast Parallel Arithmetic , 1961, IRE Trans. Electron. Comput..

[95]  Vincent Lefèvre,et al.  Multiplication by an Integer Constant , 2001 .

[96]  Warren James,et al.  1 GHz HAL SPARC64/sup R/ Dual Floating Point Unit with RAS features , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[97]  Michael J. Schulte,et al.  Decimal multiplication with efficient partial product generation , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[98]  Peter-Michael Seidel,et al.  On the design of fast IEEE floating-point adders , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[99]  Asim J. Al-Khalili,et al.  A Low Power Approach to Floating Point Adder Design for DSP Applications , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[100]  José Fernández Ramos,et al.  Two Operand Binary Adders with Threshold Logic , 1999, IEEE Trans. Computers.

[101]  John Harrison,et al.  Scientific Computing on Itanium-Based Systems , 2002 .

[102]  Florent de Dinechin,et al.  Table-based polynomials for fast hardware function evaluation , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).

[103]  M.J. Flynn,et al.  Improving the effectiveness of floating point arithmetic , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[104]  Simon Knowles,et al.  A family of adders , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[105]  Margaret Martonosi,et al.  Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques , 2000, IEEE Trans. Computers.

[106]  Earl E. Swartzlander,et al.  A floating-point fused dot-product unit , 2008, 2008 IEEE International Conference on Computer Design.

[107]  A. Dempster,et al.  Constant integer multiplication using minimum adders , 1994 .

[108]  Debjit Das Sarma,et al.  Faithful interpolation in reciprocal tables , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[109]  Arun Paidimarri,et al.  FPGA Implementation of a Single-Precision Floating-Point Multiply-Accumulator with Single-Cycle Accumulation , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[110]  Claude-Pierre Jeannerod,et al.  Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation , 2011, IEEE Transactions on Computers.

[111]  P.-M. Seidel Multiple path IEEE floating-point fused multiply-add , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[112]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[113]  Javier D. Bruguera,et al.  Floating-point multiply-add-fused with reduced latency , 2004, IEEE Transactions on Computers.

[114]  Florent de Dinechin The Price of Routing in FPGAs , 2000, J. Univers. Comput. Sci..

[115]  Michael J. Schulte,et al.  Hardware Designs for Decimal Floating-Point Addition and Related Operations , 2009, IEEE Transactions on Computers.

[116]  Eric M. Schwarz,et al.  FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.

[117]  Jean-Michel Muller,et al.  Correctly Rounded Multiplication by Arbitrary Precision Constants , 2008, IEEE Transactions on Computers.

[118]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[119]  Claude-Pierre Jeannerod,et al.  Software - FLIP: Floating-point Library for Integer Processors , 2009 .

[120]  Behrooz Parhami,et al.  Computer arithmetic - algorithms and hardware designs , 1999 .

[121]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[122]  Laurent Imbert,et al.  Multiplication by a Constant is Sublinear , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[123]  Vojin G. Oklobdzija,et al.  An algorithmic and novel design of a leading zero detector circuit: comparison with logic synthesis , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[124]  M. Ercegovac,et al.  Division and Square Root: Digit-Recurrence Algorithms and Implementations , 1994 .

[125]  Eric M. Schwarz,et al.  P6 Binary Floating-Point Unit , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[126]  Claude-Pierre Jeannerod,et al.  A New Binary Floating-Point Division Algorithm and Its Software Implementation on the ST231 Processor , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[127]  N. Burgess,et al.  Parameterisable floating-point operations on FPGA , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[128]  E.E. Swartzlander,et al.  Floating-Point Fused Multiply-Add Architectures , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[129]  John Harrison,et al.  A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format , 2009, IEEE Transactions on Computers.

[130]  Reto Zimmermann,et al.  Binary adder architectures for cell-based VLSI and their synthesis , 1997 .

[131]  Dennis W. Prather,et al.  A Study on the Design of Floating-Point Functions in FPGAs , 2003, FPL.

[132]  Alexandre F. Tenca,et al.  Multi-operand Floating-Point Addition , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[133]  F.Y. Busaba,et al.  The IBM z900 decimal arithmetic unit , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[134]  Michael F. Cowlishaw,et al.  Decimal floating-point: algorism for computers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[135]  Michael Gschwind,et al.  Integrated analysis of power and performance for pipelined microprocessors , 2004, IEEE Transactions on Computers.

[136]  Michael J. Wirthlin Constant Coefficient Multiplication Using Look-Up Tables , 2004, J. VLSI Signal Process..

[137]  Florent de Dinechin,et al.  Automatic generation of polynomial-based hardware architectures for function evaluation , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[138]  Levent Aksoy,et al.  Optimization of Area in Digital FIR Filters using Gate-Level Metrics , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[139]  Michael J. Schulte,et al.  Decimal multiplication via carry-save addition , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[140]  Javier D. Bruguera,et al.  Floating-point fused multiply-add: reduced latency for floating-point addition , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[141]  Florent de Dinechin,et al.  Multipliers for floating-point double precision and beyond on FPGAs , 2011, CARN.

[142]  Florent de Dinechin,et al.  Floating-point exponential functions for DSP-enabled FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[143]  George A. Constantinides,et al.  A Fused Hybrid Floating-Point and Fixed-Point Dot-Product for FPGAs , 2010, ARC.

[144]  O. Gustafsson,et al.  Simplified Design of Constant Coefficient Multipliers , 2006 .

[145]  Haomin Wu,et al.  A new design of the CMOS full adder , 1992 .

[146]  Peter-Michael Seidel,et al.  How many logic levels does floating-point addition require? , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[147]  James E. Robertson,et al.  A New Class of Digital Division Methods , 1958, IRE Trans. Electron. Comput..

[148]  Naofumi Takagi A hardware algorithm for computing reciprocal square root , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[149]  Wolfgang J. Paul,et al.  On the Design of IEEE Compliant Floating Point Units , 2000, IEEE Trans. Computers.

[150]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[151]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[152]  Claude-Pierre Jeannerod,et al.  Faster floating-point square root for integer processors , 2007, 2007 International Symposium on Industrial Embedded Systems.

[153]  Israel Koren Computer arithmetic algorithms , 1993 .