Numerical Reproducibility and Accuracy at ExaScale

Given current hardware trends, ExaScale computing (1018 floating point operations per second) is projected to be available in less than a decade, achieved by using a huge number of processors, of order 109. Given the likely hardware heterogeneity in both platform and network, and the possibility of intermittent failures, dynamic scheduling will be needed to adapt to changing resources and loads. This will make it likely that repeated runs of a program will not execute operations like reductions in exactly the same order. This in turn will make reproducibility, i.e. getting bitwise identical results from run to run, difficult to achieve, because floating point operations like addition are not associative, so computing sums in different orders often leads to different results. Indeed, this is already a challenge on today's platforms.

[1]  James Demmel,et al.  Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.

[2]  Siegfried M. Rump,et al.  Ultimately Fast Accurate Summation , 2009, SIAM J. Sci. Comput..

[3]  Yozo Hida,et al.  Accurate Floating Point Summation , 2002 .

[4]  Robert A. van de Geijn,et al.  BLAS (Basic Linear Algebra Subprograms) , 2011, Encyclopedia of Parallel Computing.