论文信息 - Algorithms for Efficient Reproducible Floating Point Summation

Algorithms for Efficient Reproducible Floating Point Summation

We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9n floating point operations (arithmetic, comparison, and absolute value) and approximately 3n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 229 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.

James Demmel | Peter Ahrens | Hong Nguyen

[1] David Defour,et al. ExBLAS: Reproducible and Accurate BLAS Library , 2015 .

[2] David Thomas,et al. The Art in Computer Programming , 2001 .

[3] James Demmel,et al. Parallel Reproducible Summation , 2015, IEEE Transactions on Computers.

[4] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .

[5] Donald E. Knuth. The Art of Computer Programming 2 / Seminumerical Algorithms , 1971 .

[6] Philippe Langlois,et al. Reproducible, Accurately Rounded and Efficient BLAS , 2016, Euro-Par Workshops.

[7] David Defour,et al. Reproducible and Accurate Matrix Multiplication , 2014, SCAN.

[8] Peter L. Montgomery,et al. Division by invariant integers using multiplication , 1994, PLDI '94.

[9] Siegfried M. Rump,et al. Fast high precision summation , 2010 .

[10] Ulrich W. Kulisch,et al. Computer Arithmetic and Validity - Theory, Implementation, and Applications , 2008, de Gruyter studies in mathematics.

[11] Nicholas J. Higham,et al. The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[12] Philippe Langlois,et al. Efficiency of Reproducible Level 1 BLAS , 2014, SCAN.

[13] David G. Hough. The IEEE Standard 754: One for the History Books , 2019, Computer.

[14] T. J. Dekker,et al. A floating-point technique for extending the available precision , 1971 .

[15] Xiaoye S. Li,et al. Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[16] James Demmel,et al. Augmented Arithmetic Operations Proposed for IEEE-754 2018 , 2018, 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH).

[17] David Defour,et al. Numerical reproducibility for the parallel reduction on multi- and many-core architectures , 2015, Parallel Comput..

[18] William Kahan,et al. Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[19] Torsten Hoefler,et al. Designing Bit-Reproducible Portable High-Performance Applications , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[20] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .

[21] James Demmel,et al. Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.

[22] David Defour,et al. Reproducible Triangular Solvers for High-Performance Computing , 2015, 2015 12th International Conference on Information Technology - New Generations.

[23] J. Wrench. Table errata: The art of computer programming, Vol. 2: Seminumerical algorithms (Addison-Wesley, Reading, Mass., 1969) by Donald E. Knuth , 1970 .

[24] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .

[25] Robert Alverson,et al. Integer division using reciprocals , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[26] Siegfried M. Rump,et al. Ultimately Fast Accurate Summation , 2009, SIAM J. Sci. Comput..

[27] Christopher Neal Hinds,et al. High-Precision Anchored Accumulators for Reproducible Floating-Point Summation , 2017, 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH).

[28] James Demmel,et al. Accurate and Efficient Floating Point Summation , 2003, SIAM J. Sci. Comput..