Multiplicative Square Root Algorithms for FPGAs

Most current square root implementations for FPGAs use a digit recurrence algorithm which is well suited to their LUT structure. However, recent computing-oriented FPGAs include embedded multipliers and RAM blocks which can also be used to implement quadratic convergence algorithms, very high radix digit recurrences, or polynomial approximation algorithms. The cost of these solutions is evaluated and compared, and a complete implementation of a polynomial approach is presented within the open-source FloPoCo framework. This polynomial approach allows a shorter latency and higher frequency than the digit recurrence approach, and improves over previous multiplicative approaches. However, the cost of IEEE-compliant correct rounding is shown to be very high.

[1]  Florent de Dinechin,et al.  Return of the hardware floating-point elementary function , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[2]  Florent de Dinechin,et al.  Multipliers for floating-point double precision and beyond on FPGAs , 2011, CARN.

[3]  Peter W. Markstein,et al.  IA-64 and elementary functions - speed and precision , 2000 .

[4]  N. Burgess,et al.  Parameterisable floating-point operations on FPGA , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[5]  Tomás Lang,et al.  Very High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit , 1999, IEEE Trans. Computers.

[6]  Jean-Michel Muller,et al.  Elementary Functions: Algorithms and Implementation , 1997 .

[7]  Florent de Dinechin,et al.  Large multipliers with fewer DSP blocks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[8]  Wayne Luk,et al.  Optimizing hardware function evaluation , 2005, IEEE Transactions on Computers.

[9]  Florent de Dinechin,et al.  A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic , 2007, J. VLSI Signal Process..

[10]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[11]  Claude-Pierre Jeannerod,et al.  Faster floating-point square root for integer processors , 2007, 2007 International Symposium on Industrial Embedded Systems.

[12]  John Harrison,et al.  Scientific Computing on Itanium-Based Systems , 2002 .

[13]  Yamin Li,et al.  Implementation of single precision floating point square root on FPGAs , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[14]  David M. Russinoff A Mechanically Checked Proof of Correctness of the AMD K5 Floating Point Square Root Microcode , 1999, Formal Methods Syst. Des..

[15]  David W. Matula,et al.  A 17 /spl times/ 69 bit multiply and add unit with redundant binary feedback and single cycle latency , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[16]  Miriam Leeser,et al.  Advanced Components in the Variable Precision Floating-Point Library , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[17]  Florent de Dinechin,et al.  Automatic generation of polynomial-based hardware architectures for function evaluation , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[18]  Florent de Dinechin,et al.  Generating high-performance custom floating-point pipelines , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[19]  Javier D. Bruguera,et al.  High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root , 2002, IEEE Trans. Computers.