暂无分享,去创建一个
[1] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[2] David E. Keyes,et al. Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU , 2012, Euro-Par Workshops.
[3] Ninghui Sun,et al. Fast implementation of DGEMM on Fermi GPU , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[4] Jack J. Dongarra,et al. Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems , 2014, Concurr. Comput. Pract. Exp..
[5] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[6] David E. Keyes,et al. Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators , 2012, VECPAR.
[7] Jack Dongarra,et al. LAPACK Users' Guide, 3rd ed. , 1999 .
[8] Jack J. Dongarra,et al. Optimizing symmetric dense matrix-vector multiplication on GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] Jack J. Dongarra,et al. BLAS for GPUs , 2010, Scientific Computing with Multicore and Accelerators.
[10] Jack J. Dongarra,et al. Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.
[11] David E. Keyes,et al. High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems , 2014, Euro-Par.
[12] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[13] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[14] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .
[15] Eric J. Kelmelis,et al. CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.
[16] Jack J. Dongarra,et al. An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..
[17] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Pat Hanrahan,et al. Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..
[19] Jack J. Dongarra,et al. Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing , 2010, Parallel Comput..