Optimizing symmetric dense matrix-vector multiplication on GPUs
暂无分享,去创建一个
Jack J. Dongarra | Stanimire Tomov | Rajib Nath | Tingxing Dong | J. Dongarra | S. Tomov | Rajib Nath | Tingxing Dong
[1] Jack J. Dongarra,et al. Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing , 2010, Parallel Comput..
[2] R. C. Whaley,et al. Automated empirical optimization of high performance floating point kernels , 2004 .
[3] Jack Dongarra,et al. Accelerating the reduction to upper Hessenberg form through hybrid GPU-based computing , 2009 .
[4] Jack J. Dongarra,et al. An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..
[5] Jack Dongarra,et al. Scientific Computing with Multicore and Accelerators , 2010, Chapman and Hall / CRC computational science series.
[6] Rajib Kumar Nath,et al. Accelerating Dense Linear Algebra for GPUs, Multicores and Hybrid Architectures: an Autotuned and Algorithmic Approach , 2010 .
[7] Emmanuel Agullo,et al. A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures , 2011, Euro-Par.
[8] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[9] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[10] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[11] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[12] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[13] Jack J. Dongarra,et al. BLAS for GPUs , 2010, Scientific Computing with Multicore and Accelerators.
[15] Jack J. Dongarra,et al. Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.
[16] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[17] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[18] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[19] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.