Numerical behavior of NVIDIA tensor cores
暂无分享,去创建一个
Nicholas J. Higham | Massimiliano Fasi | Mantas Mikaitis | Srikara Pranesh | N. Higham | S. Pranesh | M. Mikaitis | M. Fasi
[1] Daichi Mukunoki,et al. DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions , 2020, ISC.
[2] Wolfgang J. Paul,et al. System Architecture , 2016, Springer International Publishing.
[3] Nicholas J. Higham,et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Earl E. Swartzlander,et al. A Fused Floating-Point Four-Term Dot Product Unit , 2016, IEEE Transactions on Circuits and Systems I: Regular Papers.
[5] Nicholas J. Higham,et al. Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing , 2020 .
[6] Alexandre F. Tenca,et al. Multi-operand Floating-Point Addition , 2009, 2009 19th IEEE Symposium on Computer Arithmetic.
[7] Nicholas J. Higham,et al. A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic , 2020, ArXiv.
[8] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[9] Sanu Mathew,et al. Optimized Fused Floating-Point Many-Term Dot-Product Hardware for Machine Learning Accelerators , 2019, 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH).
[10] Jean-Michel Muller,et al. Handbook of Floating-Point Arithmetic (2nd Ed.) , 2018 .
[11] Nicholas J. Higham,et al. Mixed Precision Block Fused Multiply-Add: Error Analysis and Application to GPU Tensor Cores , 2020, SIAM J. Sci. Comput..
[12] Jeffrey S. Vetter,et al. NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[13] Marco Maggioni,et al. Dissecting the NVidia Turing T4 GPU via Microbenchmarking , 2019, ArXiv.
[14] Mark Horowitz,et al. Rounding algorithms for IEEE multipliers , 1989, Proceedings of 9th Symposium on Computer Arithmetic.
[15] Xiaowen Chu,et al. Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[16] Lee-Sup Kim,et al. A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency , 2009, IEEE Transactions on Computers.
[17] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .
[18] Paolo Rech,et al. Impact of Tensor Cores and Mixed Precision on the Reliability of Matrix Multiplication in GPUs , 2020, IEEE Transactions on Nuclear Science.
[19] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[20] David Seal,et al. ARM Architecture Reference Manual , 2001 .
[21] Nicholas J. Higham,et al. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems , 2020, Proceedings of the Royal Society A.
[22] Jack J. Dongarra,et al. The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques , 2018, ICCS.
[23] S. Morrison,et al. A Rapid and Economic In-House DNA Purification Method Using Glass Syringe Filters , 2009, PloS one.
[24] Brian J. Hickmann,et al. Experimental Analysis of Matrix Multiplication Functional Units , 2019, 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH).
[25] David J. Harper,et al. Paranoia , 2009, The Harvard mental health letter.
[26] Tao Yao,et al. Correctly rounded architectures for Floating-Point multi-operand addition and dot-product computation , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.
[27] Milos D. Ercegovac,et al. Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.