Reproducible and Accurate Matrix Multiplication
暂无分享,去创建一个
[1] David Defour,et al. Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures , 2014 .
[2] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[3] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[4] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[5] David Thomas,et al. The Art in Computer Programming , 2001 .
[6] Alex Fit-Florea,et al. Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs , 2011 .
[7] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[8] Ulrich W. Kulisch,et al. The exact dot product as basic tool for long interval arithmetic , 2011, Computing.
[9] Xiaoye S. Li,et al. Algorithms for quad-double precision floating point arithmetic , 2000, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.
[10] Martin Berggren,et al. Hybrid differentiation strategies for simulation and analysis of applications in C++ , 2008, TOMS.
[11] James Demmel,et al. Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[12] Paolo Bientinesi,et al. Computing Petaflops over Terabytes of Data , 2012, ACM Trans. Math. Softw..
[13] Tomoya Sakai,et al. Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems , 2011, ICCS.
[14] James Demmel,et al. Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.