Reproducible and Accurate Matrix Multiplication for GPU Accelerators
暂无分享,去创建一个
[1] Alex Fit-Florea,et al. Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .
[2] James Demmel,et al. Numerical Reproducibility and Accuracy at ExaScale , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[3] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[4] Ulrich W. Kulisch,et al. The exact dot product as basic tool for long interval arithmetic , 2011, Computing.
[5] Xiaoye S. Li,et al. Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.
[6] Paolo Bientinesi,et al. Solving sequences of generalized least-squares problems on multi-threaded architectures , 2014, Appl. Math. Comput..
[7] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .
[8] James Demmel,et al. Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.
[9] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[10] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[11] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[12] Donald E. Knuth,et al. The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .
[13] Donald Ervin Knuth,et al. The Art of Computer Programming , 1968 .
[14] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[15] Anna Gavling,et al. The ART at , 2008 .
[16] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[17] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[18] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[19] Tomoya Sakai,et al. Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems , 2011, ICCS.
[20] James Demmel,et al. Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[21] Paolo Bientinesi,et al. Computing Petaflops over Terabytes of Data , 2012, ACM Trans. Math. Softw..
[22] David Defour,et al. Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures , 2014 .