Accurately computing the log-sum-exp and softmax functions

Evaluating the log-sum-exp function or the softmax function is a key step in many modern data science algorithms, notably in inference and classification. Because of the exponentials that these functions contain, the evaluation is prone to overflow and underflow, especially in low precision arithmetic. Software implementations commonly use alternative formulas that avoid overflow and reduce the chance of harmful underflow, employing a shift or another rewriting. Although mathematically equivalent, these variants behave differently in floating-point arithmetic \new{and shifting can introduce subtractive cancellation}. We give rounding error analyses of different evaluation algorithms and interpret the error bounds using condition numbers for the functions. We conclude, based on the analysis and numerical experiments, that the shifted formulas are of similar accuracy to the unshifted ones, so can safely be used, but that a division-free variant of softmax can suffer from loss of accuracy.

[1]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[2]  Nicholas J. Higham,et al.  The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[3]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[5]  Nicholas J. Higham,et al.  Accuracy and stability of numerical algorithms, Second Edition , 2002 .

[6]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  J. Muller Elementary Functions, Algorithms and Implementation, 2nd Edition , 2006 .

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Jean-Michel Muller,et al.  Fast and correctly rounded logarithms in double-precision , 2007, RAIRO Theor. Informatics Appl..

[11]  Nicholas J. Higham,et al.  Functions of matrices - theory and computation , 2008 .

[12]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[13]  Jean-Michel Muller,et al.  Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[14]  Siegfried M. Rump Error estimation of floating-point summation and dot product , 2012 .

[15]  Nicholas J. Higham,et al.  The Matrix Unwinding Function, with an Application to Computing the Matrix Exponential , 2014, SIAM J. Matrix Anal. Appl..

[16]  Trevor Hastie,et al.  Computer Age Statistical Inference by Bradley Efron , 2016 .

[17]  Trevor Hastie,et al.  Computer Age Statistical Inference: Algorithms, Evidence, and Data Science , 2016 .

[18]  Jean-Michel Muller,et al.  Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[19]  Danyang Zhu,et al.  A High-Speed and Low-Complexity Architecture for Softmax Function in Deep Learning , 2018, 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[20]  Desmond J. Higham,et al.  Edinburgh Research Explorer Deep learning: an introduction for applied mathematicians , 2022 .

[21]  Jian Tang,et al.  Softmax Optimizations for Intel Xeon Processor-based Platforms , 2019, ArXiv.

[22]  Nicholas J. Higham,et al.  Simulating Low Precision Floating-Point Arithmetic , 2019, SIAM J. Sci. Comput..

[23]  Nicholas J. Higham,et al.  A New Approach to Probabilistic Rounding Error Analysis , 2019, SIAM J. Sci. Comput..

[24]  Siegfried M. Rump,et al.  Sharp estimates for perturbation errors in summations , 2019, Math. Comput..

[25]  Giuseppe Carlo Calafiore,et al.  Log-Sum-Exp Neural Networks and Posynomial Models for Convex and Log-Log-Convex Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Mark Tygert,et al.  Secure multiparty computations in floating-point arithmetic , 2020, Information and Inference: A Journal of the IMA.

[27]  Theo Mary,et al.  A Class of Fast and Accurate Summation Algorithms , 2020, SIAM J. Sci. Comput..