Research on the Accuracy of Single Precision on Graphics Processing Unit
暂无分享,去创建一个
Tao Yuan | Dongyi Guan | Zhu Mingfa | Xiao Limin | Ruan Li | Siming Chen | Ding Yi | Dongyi Guan | Xiao Limin | Zhu Mingfa | Tao Yuan | Ruan Li | Siming Chen | Ding Yi
[1] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[2] Nathan A. Carr,et al. Cache and bandwidth aware matrix multiplication on the GPU , 2010 .
[3] William Kahan,et al. Pracniques: further remarks on reducing truncation errors , 1965, CACM.
[4] Philipp Birken,et al. Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.
[5] Jack M. Wolfe. Reducing truncation errors by programming , 1964, CACM.
[6] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.
[7] Jeffrey S. Vetter,et al. Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study , 2009, Parallel Comput..
[8] Joseph JáJá,et al. An Introduction to Parallel Algorithms , 1992 .
[9] Julien Langou,et al. Accelerating scientific computations with mixed precision algorithms , 2008, Comput. Phys. Commun..
[10] GoldbergDavid,et al. "What Every Computer Scientist Should Know About Floating-Point Arithmetic" , 1991, ACM Comput. Surv..
[11] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Zhou Yong. Software/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture , 2008 .
[13] David Goldberg,et al. What every computer scientist should know about floating-point arithmetic , 1991, CSUR.
[14] Shuai Zhang,et al. Software/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture: Software/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture , 2009 .
[15] David K. McAllister,et al. Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[16] Robert Strzodka,et al. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations , 2007, Int. J. Parallel Emergent Distributed Syst..
[17] Dinesh Manocha,et al. Memory - A memory model for scientific algorithms on graphics processors , 2006, SC.
[18] Peter Schröder,et al. Quantum Monte Carlo on graphical processing units , 2007, Comput. Phys. Commun..
[19] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.
[20] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, HiPC 2008.
[21] Satoshi Matsuoka,et al. Software-Based ECC for GPUs , 2011 .