Variable Precision Floating Point Reciprocal, Division and Square Root for Major FPGA Vendors

Variable precision floating point operations have various fields of applications including scientific computing and signal processing. Field Programmable Gate Arrays (FPGAs) are a good platform to accelerate such applications because of their flexibility, low development time and cost compared to Application Specific Integrated Circuits (ASICs) and low power consumption compared to Graphics Processing Units (GPUs). Increasingly scientists are interested in variable precision floating point operations not limited to single or double precision operations implemented on FPGAs, in order to make those operations more resource efficient and more suited to their own applications. Among those operations, the performance of reciprocal, division and square root can differ based on the algorithm implemented. They can highly affect the total performance of the application running them. In this thesis, we improve these three operations using a table based approach. Our implementation is written in Very High Speed Integrated Circuits Hardware Description Language (VHDL) and implemented on FPGAs. These components have been implemented using both Altera and Xilinx development environments, the two major FPGA vendors. Also these implementations provide a good tradeoff among hardware resource utilization, maximum clock frequency and latency. Users can change the latency by adjusting the parameters of the components. In addition to supporting the IEEE 754 standard representations which include single and double precision, these components can be customized by

[1]  Arnaud Tisserand,et al.  Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers , 2000, IEEE Trans. Computers.

[2]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[3]  Miriam Leeser,et al.  Division and square root: choosing the right implementation , 1997, IEEE Micro.

[4]  Brent E. Nelson,et al.  Novel Optimizations for Hardware Floating-Point Units in a Modern FPGA Architecture , 2002, FPL.

[5]  Milos D. Ercegovac,et al.  A digit-recurrence square root implementation for field programmable gate arrays , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[6]  M. Flynn,et al.  Fast division algorithm with a small lookup table , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[7]  Peter-Michael Seidel,et al.  An FPGA implementation of pipelined multiplicative division with IEEE Rounding , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[8]  Florent de Dinechin,et al.  A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic , 2007, J. VLSI Signal Process..

[9]  James E. Robertson,et al.  A New Class of Digital Division Methods , 1958, IRE Trans. Electron. Comput..

[10]  C. V. Freiman,et al.  Statistical Analysis of Certain Binary Division Algorithms , 1961, Proceedings of the IRE.

[11]  Bogdan Pasca Correctly rounded floating-point division for DSP-enabled FPGAs , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[12]  Miriam Leeser,et al.  Area and performance tradeoffs in floating-point divide and square-root implementations , 1996, CSUR.

[13]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[14]  Ray C. C. Cheung,et al.  High Performance Reconfigurable Architecture for Double Precision Floating Point Division , 2012, ARC.

[15]  Miriam Leeser,et al.  VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware , 2010, TRETS.