Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators
暂无分享,去创建一个
Jack Dongarra | David E. Keyes | Hatem Ltaief | Ahmad Abdelfattah | D. Keyes | J. Dongarra | H. Ltaief | A. Abdelfattah
[1] Jack J. Dongarra,et al. Optimizing symmetric dense matrix-vector multiplication on GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[2] P. Glaskowsky. NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .
[3] Jack J. Dongarra,et al. Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.
[4] Ninghui Sun,et al. Fast implementation of DGEMM on Fermi GPU , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[5] Jack J. Dongarra,et al. Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization , 2008, IEEE Transactions on Parallel and Distributed Systems.
[6] Samuel Williams,et al. Auto-Tuning the 27-point Stencil for Multicore , 2009 .
[7] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..