论文信息 - Fast Reproducible Floating-Point Summation

Fast Reproducible Floating-Point Summation

Reproducibility, i.e. getting the bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [1]. However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, make attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump's algorithm for error-free vector transformation [2], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm trades off efficiency and accuracy: we reproducibly attain reasonably accurate results (with an absolute error bound c · n2 · macheps · max |vi| for a small constant c) with just 2n + O(1) floating-point operations, and quite accurate results (with an absolute error bound c · n3 · macheps2 · max |vi| with 5n + O(1) floating point operations, both with just two reduction operations. Higher accuracies are also possible by increasing the number of error-free transformations. As long as the same rounding mode is used, results computed by the proposed algorithms are reproducible for any run on any platform.

James Demmel | Hong Diep Nguyen

[1] Siegfried M. Rump,et al. Ultimately Fast Accurate Summation , 2009, SIAM J. Sci. Comput..

[2] Guy E. Blelloch,et al. Prefix sums and their applications , 1990 .

[3] Siegfried M. Rump,et al. Fast high precision summation , 2010 .

[4] Mei Han An,et al. accuracy and stability of numerical algorithms , 1991 .

[5] T. J. Dekker,et al. A floating-point technique for extending the available precision , 1971 .

[6] Siegfried M. Rump,et al. Accurate Floating-Point Summation Part I: Faithful Rounding , 2008, SIAM J. Sci. Comput..

[7] Siegfried M. Rump,et al. INTLAB - INTerval LABoratory , 1998, SCAN.

[8] Jürgen Wolff von Gudenberg,et al. A long accumulator like a carry-save adder , 2011, Computing.

[9] Sriram Krishnamoorthy,et al. Effects of floating-point non-associativity on numerical computations on massively multithreaded systems , 2009 .

[10] T. Csendes. Developments in Reliable Computing , 2000 .

[11] Philip Saponaro,et al. Improving numerical reproducibility and stability in large-scale numerical simulations on GPUs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).