Fast, Efficient Floating-Point Adders and Multipliers for FPGAs

Floating-point applications are a growing trend in the FPGA community. As such, it has become critical to create floating-point units optimized for standard FPGA technology. Unfortunately, the FPGA design space is very different from the VLSI design space; thus, optimizations for FPGAs can differ significantly from optimizations for VLSI. In particular, the FPGA environment constrains the design space such that only limited parallelism can be effectively exploited to reduce latency. Obtaining the right balances between clock speed, latency, and area in FPGAs can be particularly challenging. This article presents implementation details for an IEEE-754 standard floating-point adder and multiplier for FPGAs. The designs presented here enable a Xilinx Virtex4 FPGA (-11 speed grade) to achieve 270 MHz IEEE compliant double precision floating-point performance with a 9-stage adder pipeline and 14-stage multiplier pipeline. The area requirement is approximately 500 slices for the adder and under 750 slices for the multiplier.

[1]  Pavle Belanovic,et al.  A Library of Parameterized Floating-Point Modules and Their Use , 2002, FPL.

[2]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[3]  Karl S. Hemmert,et al.  A CAD suite for high-performance FPGA design , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[4]  Viktor K. Prasanna,et al.  A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  William D. Smith,et al.  Towards an RCC-Based Accelerator for Computational Fluid Dynamics Applications , 2004, The Journal of Supercomputing.

[6]  Peter-Michael Seidel,et al.  On the design of fast IEEE floating-point adders , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[7]  Peter-Michael Seidel,et al.  A comparison of three rounding algorithms for IEEE floating-point multiplication , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[8]  Todd A. Cook,et al.  Implementation of IEEE single precision floating point addition and multiplication on FPGAs , 1996, 1996 Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[9]  Peter-Michael Seidel,et al.  A Comparison of Three Rounding Algorithms for IEEE Floating-Point Multiplication , 2000, IEEE Trans. Computers.

[10]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[11]  Warren James,et al.  1 GHz HAL SPARC64/sup R/ Dual Floating Point Unit with RAS features , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[12]  Karl S. Hemmert,et al.  Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[13]  Yvon Savaria,et al.  A flexible floating-point format for optimizing data-paths and operators in FPGA based DSPs , 2002, FPGA '02.

[14]  Russell Tessier,et al.  Floating point unit generation and evaluation for FPGAs , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[15]  Brent E. Nelson,et al.  Tradeoffs of designing floating-point division and square root on Virtex FPGAs , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[16]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[17]  Barry S. Fagin,et al.  Field programmable gate arrays and floating point arithmetic , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[19]  Scott McMillan,et al.  A re-evaluation of the practicality of floating-point operations on FPGAs , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[20]  Wayne Luk,et al.  Automating Customisation of Floating-Point Designs , 2002, FPL.

[21]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[22]  Wayne Luk,et al.  Floating-point bitwidth analysis via automatic differentiation , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[23]  Karl S. Hemmert,et al.  An analysis of the double-precision floating-point FFT on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[24]  Javier D. Bruguera,et al.  Floating-point multiply-add-fused with reduced latency , 2004, IEEE Transactions on Computers.

[25]  Eric M. Schwarz,et al.  FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.

[26]  Peter M. Athanas,et al.  Quantitative analysis of floating point arithmetic on FPGA based custom computing machines , 1995, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines.

[27]  Norman P. Jouppi,et al.  The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.

[28]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[29]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[30]  Margaret Martonosi,et al.  Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques , 2000, IEEE Trans. Computers.