Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs
暂无分享,去创建一个
Stanimire Tomov | Ahmad Abdelfattah | Jack Dongarra | Cade Brown | J. Dongarra | S. Tomov | A. Abdelfattah | Cade Brown
[1] Stan Tomov,et al. Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs , 2020, ICCS.
[2] Jack Dongarra,et al. hipMAGMA v1.0 , 2020 .
[3] Jack J. Dongarra,et al. HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi , 2015, Sci. Program..
[4] Jack Dongarra,et al. A Proposed API for Batched Basic Linear Algebra Subprograms , 2016 .
[5] Jack J. Dongarra,et al. Optimizing Krylov Subspace Solvers on Graphics Processing Units , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[6] Jack J. Dongarra,et al. An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..
[7] David E. Keyes,et al. Redesigning Triangular Dense Matrix Computations on GPUs , 2016, Euro-Par.
[8] Hans Henrik Brandenborg Sørensen,et al. High-Performance Matrix-Vector Multiplication on the GPU , 2011, Euro-Par Workshops.
[9] André Seznec,et al. Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[10] Jack J. Dongarra,et al. Optimizing symmetric dense matrix-vector multiplication on GPUs , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] J. Demmel,et al. Sun Microsystems , 1996 .
[12] Jack J. Dongarra,et al. Out of memory SVD solver for big data , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).
[13] Jack J. Dongarra,et al. Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.
[14] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Jack J. Dongarra,et al. High-Performance Matrix-Matrix Multiplications of Very Small Matrices , 2016, Euro-Par.
[16] Stanimire Tomov,et al. One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators , 2012, ICCS.
[17] Jack J. Dongarra,et al. Performance, Design, and Autotuning of Batched GEMM for GPUs , 2016, ISC.
[18] Ninghui Sun,et al. Fast implementation of DGEMM on Fermi GPU , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[20] Jack J. Dongarra,et al. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..
[21] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[22] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[23] David E. Keyes,et al. KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators , 2014, ACM Trans. Math. Softw..
[24] Nicholas J. Higham,et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[25] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.