Implementation of binary floating-point arithmetic on embedded integer processors - Polynomial evaluation-based algorithms and certified code generation

Today some embedded systems still do not integrate their own floating-point unit, for area, cost, or energy consumption constraints. However, this kind of architectures is widely used in application domains highly demanding on floating-point calculations (multimedia, audio and video, or telecommunications). To compensate this lack of floating-point hardware, floating-point arithmetic has to be emulated efficiently through a software implementation. This thesis addresses the design and implementation of an efficient software support for IEEE 754 floating-point arithmetic on embedded integer processors. More specifically, it proposes new algorithms and tools for the efficient generation of fast and certified programs, allowing in particular to obtain C codes of very low latency for polynomial evaluation in fixed-point arithmetic. Compared to fully hand-written implementations, these tools allow to significantly reduce the development time of floating-point operators. The first part of the thesis deals with the design of optimized algorithms for some binary floating-point operators, and gives details on their software implementation for the binary32 floating-point format and for some embedded VLIW integer processors like those of the STMicroelectronics ST200 family. In particular, we propose here a uniform approach for correctly-rounded roots and their reciprocals, and an extension to division. Our approach, which relies on the evaluation of a single bivariate polynomial, allows higher ILP-exposure than previous methods and turns out to be particularly efficient in practice. This work allowed us to produce a fully revised version of the FLIP library, leading to significant gains compared to the previous version. The second part of the thesis presents a methodology for automatically and efficiently generating fast and certified C codes for the evaluation of bivariate polynomials in fixed-point arithmetic. In particular, it consists of some heuristics for computing highly parallel, low-latency evaluation schemes, as well as some techniques to check if those schemes remain efficient on a real target, and accurate enough to ensure correct rounding of the underlying operator implementations. This approach has been implemented in the software tool CGPE (Code Generation for Polynomial Evaluation). We have used our tool to quickly generate and certify significant parts of the codes of FLIP.

[1]  Michael J. Flynn,et al.  Fast IEEE Rounding for Division by Functional Iteration , 1996 .

[2]  Claude-Pierre Jeannerod,et al.  Computing Floating-Point Square Roots via Bivariate Polynomial Evaluation , 2011, IEEE Transactions on Computers.

[3]  S. Chevillard,et al.  A Certified Infinite Norm for the Implementation of Elementary Functions , 2007 .

[4]  Ramesh C. Agarwal,et al.  Series approximation methods for divide and square root in the Power3/sup TM/ processor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[5]  Javier D. Bruguera,et al.  High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root , 2002, IEEE Trans. Computers.

[6]  Juan Manuel Peña,et al.  On the Multivariate Horner Scheme II: Running Error Analysis , 2000, Computing.

[7]  Jean-Michel Muller,et al.  Elementary Functions: Algorithms and Implementation , 1997 .

[8]  Jean-Michel Muller,et al.  Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[9]  G. Stewart Afternotes goes to graduate school : lectures on advanced numerical analysis : a series of lectures on advanced numerical analysis presented at the University of Maryland at College Park and recorded after the fact , 1998 .

[10]  Milos D. Ercegovac,et al.  Improving Goldschmidt Division, Square Root, and Square Root Reciprocal , 2000, IEEE Trans. Computers.

[11]  Sylvie Boldo,et al.  A Simple Test Qualifying the Accuracy of Horner'S Rule for Polynomials , 2004, Numerical Algorithms.

[12]  Michael J. Flynn,et al.  Design Issues in Division and Other Floating-Point Operations , 1997, IEEE Trans. Computers.

[13]  J. Muller,et al.  CR-LIBM A library of correctly rounded elementary functions in double-precision , 2006 .

[14]  M. Ercegovac,et al.  Division and Square Root: Digit-Recurrence Algorithms and Implementations , 1994 .

[15]  Vladik Kreinovich,et al.  Greedy algorithms for optimizing multivariate Horner schemes , 2004, SIGS.

[16]  Claude-Pierre Jeannerod,et al.  A New Binary Floating-Point Division Algorithm and Its Software Implementation on the ST231 Processor , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[17]  bob. norin IA-64 Floating-Point Operations and the IEEE Standard for Binary Floating-Point Arithmetic , 1999 .

[18]  David W. Matula,et al.  On infinitely precise rounding for division, square root, reciprocal and square root reciprocal , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[19]  Donald E. Knuth,et al.  Evaluation of polynomials by computer , 1962, Commun. ACM.

[20]  Marco Mezzalama,et al.  Survey of Square Rooting Algorithms , 1990 .

[21]  Paul Zimmermann Implementation of the reciprocal square root in MPFR , 2008, Numerical Validation in Current Hardware Architectures.

[22]  Stuart F. Oberman,et al.  Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[23]  Ronald L. Graham,et al.  Concrete Mathematics, a Foundation for Computer Science , 1991, The Mathematical Gazette.

[24]  Javier D. Bruguera,et al.  A Radix-2 Digit-by-Digit Architecture for Cube Root , 2008, IEEE Transactions on Computers.

[25]  C. T. Fike Methods of evaluating polynomial approximations in function evaluation routines , 1967, CACM.

[26]  Paolo Faraboschi,et al.  Embedded Computing: A VLIW Approach to Architecture, Compilers and Tools , 2004 .

[27]  Michael J. Schulte,et al.  High-speed inverse square roots , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[28]  Fabrice Rouillier,et al.  Motivations for an Arbitrary Precision Interval Arithmetic and the MPFI Library , 2005, Reliab. Comput..

[29]  Tomás Lang,et al.  Digit-Serial Arithmetic , 2004 .

[30]  L. Trefethen,et al.  Barycentric-Remez algorithms for best polynomial approximation in the chebfun system , 2009 .

[31]  V. Pan METHODS OF COMPUTING VALUES OF POLYNOMIALS , 1966 .

[32]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[33]  Vincent Lefèvre,et al.  MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[34]  Christoph Quirin Lauter Arrondi correct de fonctions mathématiques : fonctions univariées et bivariées, certification et automatisation , 2008 .

[35]  Bogdan Pasca,et al.  Racines carrées multiplicatives sur FPGA , 2009 .

[36]  Guillaume Revy Analyse et implantation d'algorithmes rapides pour l'évaluation polynomiale sur les nombres flottants , 2006 .

[37]  Claude-Pierre Jeannerod,et al.  Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores , 2009, 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers.

[38]  Peter W. Markstein Computation of Elementary Functions on the IBM RISC System/6000 Processors , 1990, IBM J. Res. Dev..

[39]  Qian Ren Optimizing behavioral transformations using Taylor Expansion Diagrams , 2008 .

[40]  P. Faraboschi,et al.  Lx: a technology platform for customizable VLIW embedded processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[41]  Wayne Luk,et al.  Automating custom-precision function evaluation for embedded processors , 2005, CASES '05.

[42]  Claude-Pierre Jeannerod,et al.  Faster floating-point square root for integer processors , 2007, 2007 International Symposium on Industrial Embedded Systems.

[43]  Christoph Quirin Lauter,et al.  Certified and Fast Computation of Supremum Norms of Approximation Errors , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.

[44]  Dong-U Lee,et al.  Optimized Custom Precision Function Evaluation for Embedded Processors , 2009, IEEE Transactions on Computers.

[45]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[46]  Michael J. Flynn,et al.  Division Algorithms and Implementations , 1997, IEEE Trans. Computers.

[47]  Larry J. Stockmeyer,et al.  On the Number of Nonscalar Multiplications Necessary to Evaluate Polynomials , 1973, SIAM J. Comput..

[48]  Peter W. Markstein,et al.  Software Division and Square Root Using Goldschmidt's Algorithms , 2004 .

[49]  John Harrison,et al.  Scientific Computing on the Itanium ™ Processor , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[50]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[51]  Emmanuel Lazard Architecture de l'ordinateur , 2006 .

[52]  Matthieu Martel Enhancing the implementation of mathematical formulas for fixed-point and floating-point arithmetics , 2009, Formal Methods Syst. Des..

[53]  Javier D. Bruguera,et al.  Analysis of the impact of different methods for division/square root computation in the performance of a superscalar microprocessor , 2003, J. Syst. Archit..

[54]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[55]  J. Eve The evaluation of polynomials , 1964 .

[56]  Richard G. Lyons Function Approximation Using Polynomials , 2007 .

[57]  Sylvie Boldo,et al.  Preuves formelles en arithmétiques à virgule flottante , 2004 .

[58]  Arnaud Tisserand,et al.  A floating-point library for integer processors , 2004, SPIE Optics + Photonics.

[59]  Sylvain Chevillard,et al.  Évaluation efficace de fonctions numériques - Outils et exemples. (Efficient evaluation of numerical functions - Tools and examples) , 2009 .

[60]  Guillaume Melquiond,et al.  De l'arithmétique d'intervalles à la certification de programmes. (From interval arithmetic to program verification) , 2006 .