Variable precision in mathematical and scientific computing Proposal for an ICERM workshop

From its introduction in the 1980s, the IEEE-754 standard for floating-point arithmetic has ably served a wide range of users: mathematicians, computer scientists, physicists, chemists, biologists, social scientists and engineers. William Kahan, who led the development of this standard, later received the ACM Turing Award for this work. The initial standard specified 32-bit (“single”) and 64-bit (“double”) floating-point arithmetic, which were quickly adopted by processor vendors. Even today, the vast majority of numerical computations in research and engineering employ either IEEE single or IEEE double, typically one or the other exclusively in a single application. However, recent developments have exhibited the need for a broader range of precision levels, and variable precision within a single application. There are clear performance advantages to a variable precision framework: faster processing, better cache utilization, lower memory usage and lower long-term data storage. But effective usage of variable precision requires a more sophisticated mathematical framework, together with corresponding software tools and diagnostic facilities. At the low end, the explosive rise of graphics, artificial intelligence and machine learning has underscored the utility of reduced precision levels. Accordingly, an IEEE 16-bit “half” precision standard has been specified, with five exponent bits and ten mantissa bits. Many in the machine learning community are using the “bfloat16” format, which has eight exponent bits and seven mantissa bits. Hardware such as NVIDIA’s tensor core units can take advantage of these formats to significantly increase processing rates. At the same time, researchers in the high-performance computing (HPC) field, in a drive to achieve exascale computing, are also reconsidering their usage of numerical precision, since as

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Vincent Lefèvre,et al.  MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[3]  Jack J. Dongarra,et al.  Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy , 2008, TOMS.

[4]  Julien Langou,et al.  Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..

[5]  Jonathan M. Borwein,et al.  The computation of previously inaccessible digits of π² and Catalan's constant , 2013 .

[6]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[7]  M. Saunders,et al.  Solving Multiscale Linear Programs Using the Simplex Method in Quadruple Precision , 2015 .

[8]  John L. Gustafson,et al.  Beating Floating Point at its Own Game: Posit Arithmetic , 2017, Supercomput. Front. Innov..

[9]  Jonathan M. Borwein,et al.  Computer Discovery and Analysis of Large Poisson Polynomials , 2017, Exp. Math..

[10]  Peter Lindstrom,et al.  Universal coding of the reals: alternatives to IEEE floating point , 2018 .

[11]  Nicholas J. Higham,et al.  Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Peter Lindstrom,et al.  Error Analysis of ZFP Compression for Floating-Point Data , 2018, SIAM J. Sci. Comput..

[13]  Nicholas J. Higham,et al.  A New Approach to Probabilistic Rounding Error Analysis , 2019, SIAM J. Sci. Comput..