论文信息 - Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures

Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures

On modern multi-core, many-core, and heterogeneous architectures, floating-point computations, especially reductions, may become non-deterministic and thus non-reproducible mainly due to non-associativity of floating-point operations. We introduce a solution to compute eterministic sums of floating-point numbers efficiently and with the best possible accuracy. Our multi-level algorithm consists of two main stages: a filtering stage that uses fast vectorized floating-point expansions; an accumulation stage based on superaccumulators in a high-radix carry-save representation. We present implementations on recent Intel desktop and server processors, on Intel Xeon Phi accelerator, and on AMD and NVIDIA GPUs. We show that the numerical reproducibility and bit-perfect accuracy can be achieved at no additional cost for large sums that have dynamic ranges of up to 90 orders of magnitude by leveraging arithmetic units that are left underused by standard reduction algorithms.

[1] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .

[2] Donald E. Knuth,et al. The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[3] Vincent Lefèvre,et al. MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.

[4] Ulrich W. Kulisch,et al. Comments on Fast and Exact Accumulation of Products , 2010, PARA.

[5] Alex Fit-Florea,et al. Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .

[6] Guillaume Melquiond,et al. Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd , 2008, IEEE Transactions on Computers.

[7] James Reinders,et al. Intel® threading building blocks , 2008 .

[8] David Defour,et al. SOFTWARE CARRY-SAVE FOR FAST MULTIPLE-PRECISION ALGORITHMS , 2002 .

[9] Siegfried M. Rump,et al. Ultimately Fast Accurate Summation , 2009, SIAM J. Sci. Comput..

[10] Jonathan M. Borwein,et al. High-precision computation: Mathematical physics and dynamics , 2010, Appl. Math. Comput..

[11] Ulrich W. Kulisch,et al. The exact dot product as basic tool for long interval arithmetic , 2011, Computing.

[12] James Demmel,et al. Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.