An FPGA-specific approach to floating-point accumulation and sum-of-products

This article studies two common situations where the flexibility of FPGAs allows one to design application-specific floating-point operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floating-point values, an ad-hoc accumulator is proposed. By tailoring its parameters to the numerical requirements of the application, it can be made arbitrarily accurate, at an area cost comparable to that of a standard floating-point adder, and at a higher frequency. The second example is the sum-of-product operation, which is the building block of matrix computations. A novel architecture is proposed that feeds the previous accumulator out of a floating-point multiplier whose rounding logic has been removed, again improving the area/accuracy tradeoff. These architectures are implemented within the FloPoCo generator, freely available under the LGPL.

[1]  Pavle Belanovic,et al.  A Library of Parameterized Floating-Point Modules and Their Use , 2002, FPL.

[2]  Ulrich W. Kulisch,et al.  Advanced Arithmetic for the Digital Computer, Design of Arithmetic Units , 2002, RealComp.

[3]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[4]  O. Cref,et al.  FPGA-Based Acceleration of the Computations Involved in Transcranial Magnetic Stimulation , 2008, 2008 4th Southern Conference on Programmable Logic.

[5]  Peter M. Athanas,et al.  Quantitative analysis of floating point arithmetic on FPGA based custom computing machines , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[6]  Brent E. Nelson,et al.  Novel Optimizations for Hardware Floating-Point Units in a Modern FPGA Architecture , 2002, FPL.

[7]  Dennis W. Prather,et al.  Floating-Point Accumulation Circuit for Matrix Applications , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[8]  Florent de Dinechin,et al.  A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic , 2007, J. VLSI Signal Process..

[9]  Margaret Martonosi,et al.  Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques , 2000, IEEE Trans. Computers.

[10]  N. Burgess,et al.  Parameterisable floating-point operations on FPGA , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[11]  Viktor K. Prasanna,et al.  High Performance Linear Algebra Operations on Reconfigurable Systems , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[12]  Yamin Li,et al.  Implementation of single precision floating point square root on FPGAs , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[13]  R. Andraka Hybrid Floating Point Technique Yields 1 . 2 Gigasample Per Second 32 to 2048 point Floating Point FFT in a single FPGA , 2006 .

[14]  Florent de Dinechin,et al.  Return of the hardware floating-point elementary function , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[15]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[16]  Jianhua Liu,et al.  An iterative division algorithm for FPGAs , 2006, FPGA '06.

[17]  Jean-Michel Muller,et al.  Integer and floating-point constant multipliers for FPGAs , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.

[18]  Scott McMillan,et al.  A re-evaluation of the practicality of floating-point operations on FPGAs , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[19]  Mi Lu,et al.  Group-Alignment based Accurate Floating-Point Summation on FPGAs , 2006, ERSA.

[20]  Michael J. Flynn,et al.  The case for a redundant format in floating point arithmetic , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[21]  Viktor K. Prasanna,et al.  Scalable and modular algorithms for floating-point matrix multiplication on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[22]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[23]  Martin Langhammer Floating point datapath synthesis for FPGAs , 2008, 2008 International Conference on Field Programmable Logic and Applications.