Numerical reproducibility for the parallel reduction on multi- and many-core architectures
暂无分享,去创建一个
[1] Ulrich W. Kulisch,et al. Comments on Fast and Exact Accumulation of Products , 2010, PARA.
[2] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .
[3] Alex Fit-Florea,et al. Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .
[4] Wayne B. Hayes,et al. Algorithm 908 , 2010 .
[5] James Demmel,et al. Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[6] Guillaume Melquiond,et al. Emulation of a FMA and Correctly Rounded Sums: Proved Algorithms Using Rounding to Odd , 2008, IEEE Transactions on Computers.
[7] James Demmel,et al. Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.
[8] James Demmel,et al. Numerical Reproducibility and Accuracy at ExaScale , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[9] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[10] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[11] Donald E. Knuth,et al. The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .
[12] Vincent Lefèvre,et al. MPFR: A multiple-precision binary floating-point library with correct rounding , 2007, TOMS.
[13] Radford M. Neal. Fast exact summation using small and large superaccumulators , 2015, ArXiv.
[14] James Reinders,et al. Intel® threading building blocks , 2008 .
[15] David Defour,et al. Reproducible and Accurate Matrix Multiplication for GPU Accelerators , 2015 .
[16] Jonathan M. Borwein,et al. High-precision computation: Mathematical physics and dynamics , 2010, Appl. Math. Comput..
[17] David Thomas,et al. The Art in Computer Programming , 2001 .
[18] Anna Gavling,et al. The ART at , 2008 .
[19] Rodney A. Kennedy,et al. Efficient Histogram Algorithms for NVIDIA CUDA Compatible Devices , 2007 .
[20] James Demmel,et al. Parallel Reproducible Summation , 2015, IEEE Transactions on Computers.
[21] J. Muller,et al. CR-LIBM A library of correctly rounded elementary functions in double-precision , 2006 .
[22] David Defour,et al. SOFTWARE CARRY-SAVE FOR FAST MULTIPLE-PRECISION ALGORITHMS , 2002 .
[23] Jim Euchner. Design , 2014, Catalysis from A to Z.
[24] Jonathan Richard Shewchuk,et al. Robust adaptive floating-point geometric predicates , 1996, SCG '96.
[25] David Defour,et al. Reproducible Triangular Solvers for High-Performance Computing , 2015, 2015 12th International Conference on Information Technology - New Generations.
[26] Siegfried M. Rump,et al. Ultimately Fast Accurate Summation , 2009, SIAM J. Sci. Comput..
[27] Ulrich W. Kulisch,et al. The exact dot product as basic tool for long interval arithmetic , 2011, Computing.
[28] Xiaoye S. Li,et al. Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.
[29] Torsten Hoefler,et al. Designing Bit-Reproducible Portable High-Performance Applications , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.