Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation

In this paper, we show how to reduce the computation of correctly rounded square roots of binary floating-point data to the fixed-point evaluation of some particular integer polynomials in two variables. By designing parallel and accurate evaluation schemes for such bivariate polynomials, we show further that this approach allows for high instruction-level parallelism (ILP) exposure, and thus, potentially low-latency implementations. Then, as an illustration, we detail a C implementation of our method in the case of IEEE 754-2008 binary32 floating-point data (formerly called single precision in the 1985 version of the IEEE 754 standard). This software implementation, which assumes 32-bit unsigned integer arithmetic only, is almost complete in the sense that it supports special operands, subnormal numbers, and all rounding-direction attributes, but not exception handling (that is, status flags are not set). Finally, we have carried out experiments with this implementation on the ST231, an integer processor from the STMicroelectronics' ST200 family, using the ST200 family VLIW compiler. The results obtained demonstrate the practical interest of our approach in that context: for all rounding-direction attributes, the generated assembly code is optimally scheduled and has indeed low latency (23 cycles).

[1]  Florent de Dinechin,et al.  Assisted verification of elementary functions using Gappa , 2006, SAC.

[2]  Claude-Pierre Jeannerod,et al.  Faster floating-point square root for integer processors , 2007, 2007 International Symposium on Industrial Embedded Systems.

[3]  Christoph Quirin Lauter,et al.  Certified and Fast Computation of Supremum Norms of Approximation Errors , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[4]  Ramesh C. Agarwal,et al.  Series approximation methods for divide and square root in the Power3/sup TM/ processor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[5]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[6]  Javier D. Bruguera,et al.  High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root , 2002, IEEE Trans. Computers.

[7]  Jean-Michel Muller,et al.  Elementary Functions: Algorithms and Implementation , 1997 .

[8]  Peter W. Markstein,et al.  IA-64 and elementary functions - speed and precision , 2000 .

[9]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[10]  Marco Mezzalama,et al.  Survey of Square Rooting Algorithms , 1990 .

[11]  Peter Tang,et al.  The Computation of Transcendental Functions on the IA-64 Architecture , 1999 .

[12]  Sylvain Chevillard,et al.  Évaluation efficace de fonctions numériques - Outils et exemples. (Efficient evaluation of numerical functions - Tools and examples) , 2009 .

[13]  Guillaume Melquiond,et al.  De l'arithmétique d'intervalles à la certification de programmes. (From interval arithmetic to program verification) , 2006 .

[14]  Guillaume Melquiond,et al.  Certification of bounds on expressions involving rounded operators , 2007, TOMS.

[15]  Paolo Faraboschi,et al.  Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools , 2004 .

[16]  Christoph Quirin Lauter,et al.  A Certified Infinite Norm for the Implementation of Elementary Functions , 2007, Seventh International Conference on Quality Software (QSIC 2007).

[17]  Arnaud Tisserand,et al.  A floating-point library for integer processors , 2004, SPIE Optics + Photonics.

[18]  Christoph Quirin Lauter Arrondi correct de fonctions mathématiques : fonctions univariées et bivariées, certification et automatisation , 2008 .

[19]  Claude-Pierre Jeannerod,et al.  A New Binary Floating-Point Division Algorithm and Its Software Implementation on the ST231 Processor , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[20]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[21]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[22]  Guillaume Revy,et al.  Implementation of binary floating-point arithmetic on embedded integer processors - Polynomial evaluation-based algorithms and certified code generation , 2009 .

[23]  John Harrison,et al.  Scientific Computing on Itanium-Based Systems , 2002 .

[24]  Claude-Pierre Jeannerod,et al.  Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.