论文信息 - Meta-implementation of vectorized logarithm function in binary floating-point arithmetic

Meta-implementation of vectorized logarithm function in binary floating-point arithmetic

Besides scalar instructions, modern micro-architectures also provide support for vector instructions. They enable to treat packed inputs (typically 4 or 8) in a single instruction. The challenge is now to write vector programs to support mathematical functions like sin, cos, exp, log, … which efficiently exploit those vector instructions. This article focuses on the design of vectorized implementation of log(x) function, and more particularly on its automation for different formats and micro-architectures. First it rewrites a classic range reduction in a branchless fashion so as to use at best recent micro-architecture features, like rcp (reciprocal) instruction, and to treat all inputs in the same flow. Second it details rigorously how to achieve “faithfully rounded” implementations. Third it shows how to automate this implementation process using the MetaLibm framework, on SSE/AVX and AVX2 supporting micro-architectures. Finally we illustrate that this process enables to achieve high throughput implementations for the binary32 and binary64 formats in a fully automated way.

[1] Christoph Quirin Lauter,et al. Basic building blocks for a triple-double intermediate format , 2005 .

[2] Henry S. Warren,et al. Hacker's Delight , 2002 .

[3] Christoph Quirin Lauter. A new open-source SIMD vector libm fully implemented with high-level scalar C , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[4] Weng-Fai Wong,et al. Fast evaluation of the elementary functions in double precision , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[5] Shmuel Gal. Computing Elementary Functions: A New Approach for Achieving High Accuracy and Good Performance , 1985, Accurate Scientific Computations.

[6] Shmuel Gal,et al. An accurate elementary mathematical library for the IEEE floating point standard , 1991, TOMS.

[7] Danilo Piparo,et al. Speeding up HEP experiment software with a library of fast and auto-vectorisable mathematical functions , 2014 .

[8] Guillaume Melquiond,et al. De l'arithmétique d'intervalles à la certification de programmes. (From interval arithmetic to program verification) , 2006 .

[9] Weng-Fai Wong,et al. Fast Evaluation of the Elementary Functions in Single Precision , 1995, IEEE Trans. Computers.

[10] Vincent Lefèvre,et al. MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[11] Christoph Quirin Lauter. Arrondi correct de fonctions mathématiques : fonctions univariées et bivariées, certification et automatisation , 2008 .

[12] Florent de Dinechin,et al. Code Generators for Mathematical Functions , 2015, 2015 IEEE 22nd Symposium on Computer Arithmetic.

[13] Christophe Mouilleron,et al. Automatic Generation of Fast and Certified Code for Polynomial Evaluation , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[14] Jean-Michel Muller,et al. Tight and Rigorous Error Bounds for Basic Building Blocks of Double-Word Arithmetic , 2017, ACM Trans. Math. Softw..

[15] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .

[16] Nicolas Brunie,et al. Contributions to computer arithmetic and applications to embedded systems. (Contribution à l'arithmétique des ordinateurs et applications aux systèmes embarqués) , 2014 .

[17] Guillaume Revy,et al. Performances de schémas d'évaluation polynomiale sur architectures vectorielles , 2016 .

[18] Jean-Michel Muller,et al. On the definition of ulp(x) , 2005 .

[19] Jean-Michel Muller,et al. Fast and correctly rounded logarithms in double-precision , 2007, RAIRO Theor. Informatics Appl..

[20] Guillaume Revy. Automated Design of Floating-Point Logarithm Functions on Integer Processors , 2016, 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH).

[21] Christoph Quirin Lauter,et al. Sollya: An Environment for the Development of Numerical Codes , 2010, ICMS.

[22] Ping Tak Peter Tang. Table-driven implementation of the logarithm function in IEEE floating-point arithmetic , 1990, TOMS.

[23] Earl E. Swartzlander,et al. Exact rounding of certain elementary functions , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[24] Weng-Fai Wong,et al. Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers , 1994, IEEE Trans. Computers.