Elementary functions: towards automatically generated, efficient, and vectorizable implementations. (Fonctions élémentaires: vers des implémentations vectorisables, efficaces, et automatiquement générées)

Elementary mathematical functions are pervasive in many high performance computing programs. However, although the mathematical libraries (libms), on which these programs rely, generally provide several flavors of the same function, these are fixed at implementation time. Hence this monolithic characteristic of libms is an obstacle for the performance of programs relying on them, because they are designed to be versatile at the expense of specific optimizations. Moreover, the duplication of shared patterns in the source code makes maintaining such code bases more error prone and difficult. A current challenge is to propose "meta-tools" targeting automated high performance code generation for the evaluation of elementary functions. These tools must allow reuse of generic and efficient algorithms for different flavours of functions or hardware architectures. Then, it becomes possible to generate optimized tailored libms with factorized generative code, which eases its maintenance. First, we propose an novel algorithm that allows to generate lookup tables that remove rounding errors for trigonometric and hyperbolic functions. The, we study the performance of vectorized polynomial evaluation schemes, a first step towards the generation of efficient vectorized elementary functions. Finally, we develop a meta-implementation of a vectorized logarithm, which factors code generation for different formats and architectures. Our contributions are shown competitive compared to free or commercial solutions, which is a strong incentive to push for developing this new paradigm.

[1]  R. Nave,et al.  A numeric data processor , 1980, 1980 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[2]  Richard W. Vuduc,et al.  Methods for High-Throughput Computation of Elementary Functions , 2013, PPAM.

[3]  J. T. Childers,et al.  Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC , 2012 .

[4]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).

[5]  Michael J. Schulte,et al.  Approximating Elementary Functions with Symmetric Bipartite Tables , 1999, IEEE Trans. Computers.

[6]  Weng-Fai Wong,et al.  Fast evaluation of the elementary functions in double precision , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[7]  A. H. Beiler,et al.  Recreations in the theory of numbers : the queen of mathematics entertains , 1965 .

[8]  David Defour,et al.  Fonctions élémentaires : algorithmes et implémentations efficaces pour l'arrondi correct en double précision. (Elementary functions : algorithms and efficient implementation for correct rounding for the double precision) , 2003 .

[9]  Christophe Mouilleron,et al.  Automatic Generation of Fast and Certified Code for Polynomial Evaluation , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[10]  Nicolas Brunie,et al.  Modified Fused Multiply and Add for Exact Low Precision Product Accumulation , 2017, 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH).

[11]  William M. Waite,et al.  Software manual for the elementary functions , 1980 .

[12]  P. Davies The Goldilocks Enigma: Why is the Universe Just Right for Life? , 2007 .

[13]  Jean-Michel Muller,et al.  On the definition of ulp(x) , 2005 .

[14]  Matthieu Martel,et al.  Improving the numerical accuracy of programs by automatic transformation , 2017, International Journal on Software Tools for Technology Transfer.

[15]  Florent de Dinechin,et al.  Assisted verification of elementary functions using Gappa , 2006, SAC.

[16]  L. L. Schumaker,et al.  Efficient evaluation of multivariate polynomials , 1986, Comput. Aided Geom. Des..

[17]  Nicolas Brunie,et al.  Contributions to computer arithmetic and applications to embedded systems. (Contribution à l'arithmétique des ordinateurs et applications aux systèmes embarqués) , 2014 .

[18]  Jean-Michel Muller,et al.  Worst cases for correct rounding of the elementary functions in double precision , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[19]  Jean-Michel Muller,et al.  Computing floating-point logarithms with fixed-point operations , 2016, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH).

[20]  Viktor Kuncak,et al.  Synthesis of fixed-point programs , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[21]  Danilo Piparo,et al.  Development and Evaluation of Vectorised and Multi-Core Event Reconstruction Algorithms within the CMS Software Framework , 2012 .

[22]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[23]  Wei-Ping Zhu,et al.  Compensation of Loudspeaker Nonlinearity in Acoustic Echo Cancellation Using Raised-Cosine Function , 2006, IEEE Transactions on Circuits and Systems II: Express Briefs.

[24]  Florent de Dinechin,et al.  Matériel et logiciel pour l'évaluation de fonctions numériques :précision, performance et validation , 2007 .

[25]  Florent de Dinechin,et al.  Automatic generation of polynomial-based hardware architectures for function evaluation , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[26]  Kunle Olukotun,et al.  Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems , 2015, SNAPL.

[27]  Florent de Dinechin,et al.  Generating high-performance custom floating-point pipelines , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[28]  Wolfgang Böhm,et al.  On de Casteljau's algorithm , 1999, Comput. Aided Geom. Des..

[29]  Yoichi Muraoka,et al.  Parallelism exposure and exploitation in programs , 1971 .

[30]  A. Hall,et al.  232. Genealogy of Pythagorean Triads , 1970 .

[31]  Sylvie Boldo,et al.  A Simple Test Qualifying the Accuracy of Horner'S Rule for Polynomials , 2004, Numerical Algorithms.

[32]  Pythagorean Triples: A New, Easy-to-Derive Formula With Some Geometric Applications. , 1974 .

[33]  Florent de Dinechin,et al.  Designing Custom Arithmetic Data Paths with FloPoCo , 2011, IEEE Design & Test of Computers.

[34]  Jonathan Richard Shewchuk,et al.  Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates , 1997, Discret. Comput. Geom..

[35]  Jean-Michel Muller,et al.  On Ziv's rounding test , 2013, TOMS.

[36]  Martin Odersky,et al.  Spiral in scala: towards the systematic construction of generators for performance libraries , 2014, GPCE '13.

[37]  Naoki Shibata Efficient evaluation methods of elementary functions suitable for SIMD computation , 2010, Computer Science - Research and Development.

[38]  Atlas Publications Measurements of the Higgs boson production and decay rates and coupling strengths using pp collision data at √ s = 7 and 8 TeV in the ATLAS experiment , 2015 .

[39]  Milos D. Ercegovac,et al.  (M, p, k)-Friendly Points: A Table-Based Method for Trigonometric Function Evaluation , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[40]  Anastasia Volkova,et al.  Towards Hardware IIR Filters Computing Just Right: Direct Form I Case Study , 2019, IEEE Transactions on Computers.

[41]  Anastasia Volkova,et al.  Towards reliable implementation of digital filters , 2017 .

[42]  Christoph Quirin Lauter A new open-source SIMD vector libm fully implemented with high-level scalar C , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[43]  Shmuel Gal Computing Elementary Functions: A New Approach for Achieving High Accuracy and Good Performance , 1985, Accurate Scientific Computations.

[44]  Shmuel Gal,et al.  An accurate elementary mathematical library for the IEEE floating point standard , 1991, TOMS.

[45]  Ulrich W. Kulisch,et al.  Evaluation of Polynomials , 1995 .

[46]  Abraham Ziv,et al.  Fast evaluation of elementary mathematical functions with correctly rounded last bit , 1991, TOMS.

[47]  William J. Cody,et al.  Implementation and testing of function software , 1980, Problems and Methodologies in Mathematical Software Production.

[48]  Christoph Quirin Lauter,et al.  Sollya: An Environment for the Development of Numerical Codes , 2010, ICMS.

[49]  Donald E. Knuth,et al.  Ancient Babylonian algorithms , 1972, CACM.

[50]  Bogdan Mihai Pasca High-performance floating-point computing on reconfigurable circuits. (Calcul flottant haute performance sur circuits reconfigurables) , 2011 .

[51]  Christoph Quirin Lauter,et al.  Basic building blocks for a triple-double intermediate format , 2005 .

[52]  Jean-Michel Muller,et al.  Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[53]  Tiark Rompf,et al.  Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters , 2014, ARRAY@PLDI.

[54]  Damien Stehlé,et al.  Gal's accurate tables method revisited , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[55]  Fred Weber,et al.  AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.

[56]  Weng-Fai Wong,et al.  Fast Evaluation of the Elementary Functions in Single Precision , 1995, IEEE Trans. Computers.

[57]  Tiark Rompf,et al.  SIMD intrinsics on managed language runtimes , 2018, CGO.

[58]  Nick Knupffer Intel Corporation , 2018, The Grants Register 2019.

[59]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[60]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[61]  M. Payne,et al.  Radian reduction for trigonometric functions , 1983, SGNM.

[62]  Jean-Michel Muller,et al.  Fast and correctly rounded logarithms in double-precision , 2007, RAIRO Theor. Informatics Appl..

[63]  Serge Torres,et al.  Tools for the Design of Reliable and Efficient Functions Evaluation Libraries. (Outils pour la conception de bibliothèques de calcul de fonctions efficaces et fiables) , 2016 .

[64]  G. Brumfiel High-energy physics: Down the petabyte highway , 2011, Nature.

[65]  Ren-Cang Li,et al.  Near optimality of Chebyshev interpolation for elementary function computations , 2004, IEEE Transactions on Computers.

[66]  J. Ian Munro,et al.  Optimal Algorithms for Parallel Polynomial Evaluation , 1971, J. Comput. Syst. Sci..

[67]  Kiyoshi M. Maruyama,et al.  On the Parallel Evaluation of Polynomials , 1973, IEEE Transactions on Computers.

[68]  C. Collaboration,et al.  Precise determination of the mass of the Higgs boson and tests of compatibility of its couplings with the standard model predictions using proton collisions at 7 and 8 TeV , 2014, 1412.8662.

[69]  J. Muller,et al.  CR-LIBM A library of correctly rounded elementary functions in double-precision , 2006 .

[70]  Guillaume Melquiond,et al.  Certification of bounds on expressions involving rounded operators , 2007, TOMS.

[71]  Alfred Vella,et al.  87.04 When is n a member of a Pythagorean triple? , 2003, The Mathematical Gazette.

[72]  Fredrik Johansson Efficient Implementation of Elementary Functions in the Medium-Precision Range , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[73]  Martin Dyer,et al.  Leibniz International Proceedings in Informatics, LIPIcs , 2016, ICALP 2016.

[74]  Noam Nisan,et al.  The Elements of Computing Systems - Building a Modern Computer from First Principles , 2005 .

[75]  Marat Dukhan,et al.  PeachPy meets Opcodes: direct machine code generation from Python , 2015, PyHPC '15.

[76]  V. M. Ghete,et al.  Constraints on the spin-parity and anomalous HVV couplings of the Higgs boson in proton collisions at 7 and 8 TeV , 2015 .

[77]  Christoph Quirin Lauter,et al.  Reliable Verification of Digital Implemented Filters Against Frequency Specifications , 2017, 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH).

[78]  J. T. Childers,et al.  Evidence for the spin-0 nature of the Higgs boson using ATLAS data , 2013, 1307.1432.

[79]  Arnaud Tisserand,et al.  Toward Correctly Rounded Transcendentals , 1998, IEEE Trans. Computers.

[80]  Tiark Rompf,et al.  Lightweight Modular Staging and Embedded Compilers - Abstraction without Regret for High-Level High-Performance Programming , 2012 .

[81]  Guillaume Melquiond,et al.  De l'arithmétique d'intervalles à la certification de programmes. (From interval arithmetic to program verification) , 2006 .

[82]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[83]  Florent de Dinechin,et al.  Hardware Implementations of Fixed-Point Atan2 , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[84]  Marc O. Eberhard,et al.  Damage to Bridges during the 2001 Nisqually Earthquake , 2001 .

[85]  Nicolas Brisebarre,et al.  Efficient polynomial L-approximations , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[86]  Arnaud Tisserand,et al.  Multipartite table methods , 2005, IEEE Transactions on Computers.

[87]  Danilo Piparo,et al.  Speeding up HEP experiment software with a library of fast and auto-vectorisable mathematical functions , 2014 .

[88]  Albert Fässler Multiple Pythagorean number triples , 1991 .

[89]  Gordon E. Moore Lithography and the future of Moore's law , 1995, Advanced Lithography.

[90]  M. Petró‐Turza,et al.  The International Organization for Standardization. , 2003 .

[91]  Nicolas Brunie,et al.  Meta-implementation of vectorized logarithm function in binary floating-point arithmetic , 2018, 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[92]  Larry J. Stockmeyer,et al.  On the Number of Nonscalar Multiplications Necessary to Evaluate Polynomials , 1973, SIAM J. Comput..

[93]  David Defour,et al.  Exact Lookup Tables for the Evaluation of Trigonometric and Hyperbolic Functions , 2017, IEEE Transactions on Computers.

[94]  Olga Kupriianova,et al.  Towards a modern floating-point environment , 2015 .

[95]  K. Sridharan,et al.  50 Years of CORDIC: Algorithms, Architectures, and Applications , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[96]  P. Shiu The Shapes and Sizes of Pythagorean Triangles , 1983 .

[97]  The Cms Collaboration Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC , 2012, 1207.7235.

[98]  Vincent Lefèvre,et al.  MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[99]  J.-M. Muller,et al.  A new scheme for table-based evaluation of functions , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[100]  Christoph Quirin Lauter Arrondi correct de fonctions mathématiques : fonctions univariées et bivariées, certification et automatisation , 2008 .

[101]  Florent de Dinechin,et al.  Optimizing polynomials for floating-point implementation , 2008, ArXiv.

[102]  Rafi Nave Implementation of transcendental functions on a numerics processor , 1983 .

[103]  David Defour,et al.  A new range-reduction algorithm , 2005, IEEE Transactions on Computers.

[104]  Earl E. Swartzlander,et al.  Exact rounding of certain elementary functions , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[105]  Weng-Fai Wong,et al.  Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers , 1994, IEEE Trans. Computers.

[106]  Arnault Ioualalen,et al.  Synthesis of arithmetic expressions for the fixed-point arithmetic: The Sardana approach , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[107]  Florent de Dinechin,et al.  Code Generators for Mathematical Functions , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[108]  Eric Goubault,et al.  Design of fixed-point embedded systems (DEFIS) French ANR project , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[109]  Peter W. Markstein,et al.  IA-64 and elementary functions - speed and precision , 2000 .

[110]  Karl-Georg Steffens The history of approximation theory : from Euler to Bernstein , 2006 .

[111]  T. Apostol Introduction to analytic number theory , 1976 .

[112]  W. S. Anglin Using Pythagorean triangles to approximate angles , 1988 .

[113]  David B. Thomas A General-Purpose Method for Faithfully Rounded Floating-Point Function Approximation in FPGAs , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[114]  C. Collaboration,et al.  Observation of a new boson with mass near 125 GeV in pp collisions at sqrt(s) = 7 and 8 TeV , 2013, 1303.4571.

[115]  Michael F. Cowlishaw,et al.  Decimal floating-point: algorism for computers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[116]  Jean-Michel Muller,et al.  Modular Range Reduction: A New Algorithm for Fast and Accurate Computation on the Elementary Functions , 1995, J. Univers. Comput. Sci..

[117]  David Defour,et al.  Range reduction based on Pythagorean triples for trigonometric function evaluation , 2015, 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[118]  Guillaume Revy Automated Design of Floating-Point Logarithm Functions on Integer Processors , 2016, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH).

[119]  Florent de Dinechin,et al.  Code generation for argument filtering and argument reduction in elementary functions , 2010 .

[120]  Christoph Quirin Lauter,et al.  Reliable Evaluation of the Worst-Case Peak Gain Matrix in Multiple Precision , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[121]  Christoph Quirin Lauter,et al.  A domain splitting algorithm for the mathematical functions code generator , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[122]  William Kahan Why do we need a oating-point arithmetic standard? , 1981 .

[123]  Arnault Ioualalen,et al.  Synthesizing accurate floating-point formulas , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[124]  Franz Franchetti,et al.  Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.

[125]  Guillaume Revy,et al.  Implementation of binary floating-point arithmetic on embedded integer processors - Polynomial evaluation-based algorithms and certified code generation , 2009 .

[126]  Viktor Kuncak,et al.  Towards a Compiler for Reals , 2014, ACM Trans. Program. Lang. Syst..

[127]  V. Pan METHODS OF COMPUTING VALUES OF POLYNOMIALS , 1966 .

[128]  Hugues de Lassus Saint-Genies,et al.  Réduction d'argument basée sur les triplets pythagoriciens pour l'évaluation de fonctions trigonométriques , 2015 .

[129]  V. M. Ghete,et al.  Study of the mass and spin-parity of the Higgs boson candidate via its decays to Z boson pairs. , 2012, Physical review letters.

[130]  Eva Darulova,et al.  On sound relative error bounds for floating-point arithmetic , 2017, 2017 Formal Methods in Computer Aided Design (FMCAD).

[131]  Debjit Das Sarma,et al.  Faithful bipartite ROM reciprocal tables , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[132]  Vaclav Smil,et al.  Book of numbers , 1996, Nature.

[133]  Florent de Dinechin,et al.  Towards the post-ultimate libm , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[134]  J. Harrison,et al.  Efficient and accurate computation of upper bounds of approximation errors , 2011, Theor. Comput. Sci..

[135]  R. Flynn,et al.  Refractive index measurements of poly(methyl methacrylate) (PMMA) from 0.4-1.6 μm. , 2015, Applied optics.

[136]  George W. Reitwiesner,et al.  Binary Arithmetic , 1960, Adv. Comput..

[137]  Ping Tak Peter Tang Table-driven implementation of the logarithm function in IEEE floating-point arithmetic , 1990, TOMS.

[138]  Christoph Quirin Lauter,et al.  Metalibm: A Mathematical Functions Code Generator , 2014, ICMS.

[139]  David Defour,et al.  Cache-Optimised Methods for the Evaluation of Elementary Functions , 2006 .

[140]  Christian Perez Ecole Normale Supérieure De Lyon , 2006 .

[141]  Christoph Quirin Lauter,et al.  Semi-Automatic Floating-Point Implementation of Special Functions , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[142]  Pythagoras' garden, revisited , 2012 .

[143]  Mohamed Amine Najahi Synthesis of certified programs in fixed-point arithmetic, and its application to linear algebra basic blocks. (Synthèse de programmes certifiés en arithmètique à virgule fixe, et son application à des briques de base d'algèbre linéaire) , 2014 .

[144]  Guillaume Revy,et al.  Performances de schémas d'évaluation polynomiale sur architectures vectorielles , 2016 .

[145]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[146]  Florent de Dinechin,et al.  Certifying the Floating-Point Implementation of an Elementary Function Using Gappa , 2011, IEEE Transactions on Computers.

[147]  Matthieu Martel,et al.  Toward the synthesis of fixed-point code for matrix inversion based on Cholesky decomposition , 2014, Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing.