Architectural Modifications to Improve Floating-Point Unit Efficiency in FPGAs

FPGAs have reached densities that can implement floating point applications, but floating-point operations still require a large amount of FPGA resources. One major component of IEEE compliant floating-point computations is variable length shifters. They account for over 30% of a double-precision floating-point adder and 25% of a double-precision multiplier. This paper introduces two alternatives for implementing these shifters. One alternative is a coarse-grained approach: embedding variable length shifters in the FPGA fabric. These units provide significant area savings with a modest clock rate improvement over existing architectures. Another alternative is a fine-grained approach: adding a 4:1 multiplexer inside the slices, in parallel to the LUTs. While providing a more modest area savings, these multiplexers provide a significant boost in clock rate with a small impact on the FPGA fabric

[1]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[2]  Jonathan Rose,et al.  Using bus-based connections to improve field-programmable gate-array density for implementing datapath circuits , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Karl S. Hemmert,et al.  Embedded floating-point units in FPGAs , 2006, FPGA '06.

[4]  Karl S. Hemmert,et al.  Open Source High Performance Floating-Point Modules , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Karl S. Hemmert,et al.  An analysis of the double-precision floating-point FFT on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[6]  Karl S. Hemmert,et al.  A CAD suite for high-performance FPGA design , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[7]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[8]  André DeHon,et al.  Floating-point sparse matrix-vector multiply for FPGAs , 2005, FPGA '05.

[9]  Keith D. Underwood,et al.  FPGAs vs. CPUs: trends in peak floating-point performance , 2004, FPGA '04.

[10]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.