High Performance Reconfigurable Architecture for Double Precision Floating Point Division

Floating point arithmetic (FPA) are very crucial and critical domain for the hardware acceleration. FPA are widely used in the vast field of application. The division operation of the FPA is a very intensive operation, in terms of complexity, area requirement and performance speed. This paper presents an efficient FPGA implementation of double-precision FPA divisions on Virtex-2pro FPGA platform, for the ease of comparing with prior works. The proposed method is based on the method of binomial expansion, which uses look-up tables and partial block multipliers (PBM). Compared with previously reported work, the proposed design occupies smaller area (in terms of number slices, number of multipliers and the BRAM usage) with a higher performance gain and less latency. By using over 5 million unique random test cases, our results show that the proposed design gives an average error of less than 0.5 ULP (unit at last place), and a maximum error of 2 ULP without using any rounding scheme. However, rounding can also be added to the design to restore some accuracy at a slight cost in area.

[1]  Nitin Chandrachoodan,et al.  FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture , 2012, IEEE Transactions on Computers.

[2]  Karl S. Hemmert,et al.  Floating-Point Divider Design for FPGAs , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[4]  Brent E. Nelson,et al.  Tradeoffs of designing floating-point division and square root on Virtex FPGAs , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[5]  Tarek El-Ghazawi,et al.  Software/Hardware Co-Scheduling for Reconfigurable Computing Systems , 2007 .

[6]  Frank Vahid,et al.  A quantitative analysis of the speedup factors of FPGAs over processors , 2004, FPGA '04.

[7]  Guy Even,et al.  An FPGA implementation of pipelined multiplicative division with IEEE Rounding , 2007 .

[8]  Viktor K. Prasanna,et al.  Efficient Floating-point Based Block LU Decomposition on FPGAs , 2004, ERSA.

[9]  Ali Akoglu,et al.  Highly Parallel FPGA Based IEEE-754 Compliant Double-Precision Floating-Point Division , 2008, ERSA.

[10]  Abdel Ejnioui,et al.  Pipelining of double precision floating point division and square root operations , 2006, ACM-SE 44.

[11]  Nader Bagherzadeh,et al.  A Reconfigurable Architecture for Wireless Communication Systems , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[12]  Luigi Ciminiera,et al.  Division unit with Newton-Raphson approximation and digit-by-digit refinement of the quotient , 1994 .

[13]  M. Ercegovac,et al.  Simple Seed Architectures for Reciprocal and Square Root Reciprocal , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[14]  John Hopf A parameterizable HandelC divider generator for FPGAs with embedded hardware multipliers , 2004, Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921).

[15]  Ali Akoglu,et al.  A Highly Parallel FPGA based IEEE-754 Compliant Double-Precision Binary Floating-Point Multiplication Algorithm , 2007, 2007 International Conference on Field-Programmable Technology.

[16]  Miriam Leeser,et al.  Advanced Components in the Variable Precision Floating-Point Library , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[17]  Tomás Lang,et al.  Low latency digit-recurrence reciprocal and square-root reciprocal algorithm and architecture , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).