A framework for dense triangular matrix kernels on various manycore architectures
暂无分享,去创建一个
[1] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[2] David E. Keyes,et al. KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators , 2014, ACM Trans. Math. Softw..
[3] David E. Keyes,et al. Redesigning Triangular Dense Matrix Computations on GPUs , 2016, Euro-Par.
[4] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[5] Jack J. Dongarra,et al. Batched matrix computations on hardware accelerators based on GPUs , 2015, Int. J. High Perform. Comput. Appl..
[6] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[7] Ioannis Caragiannis,et al. Euro-Par 2012 : parallel processing workshops : BDMC, CGWS, HeteroPar, HiBB, OMHI, Paraphrase, PROPER, Resilience, UCHPC, VHPC, Rhodes Island, Greece, August 27-31, 2012 : revised selected papers , 2013 .
[8] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[9] David E. Keyes,et al. Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators , 2012, VECPAR.
[10] Paolo Bientinesi,et al. Recursive Algorithms for Dense Linear Algebra: The ReLAPACK Collection , 2016 .
[11] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[12] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[13] Jack J. Dongarra,et al. High-Performance Tensor Contractions for GPUs , 2016, ICCS.
[14] David E. Keyes,et al. High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications , 2015, Euro-Par.
[15] Robert A. van de Geijn,et al. Level-3 BLAS on a GPU: Picking the low hanging fruit , 2012 .
[16] Fred G. Gustavson,et al. LAWRA: Linear Algebra with Recursive Algorithms , 2000, PARA.
[17] Yi Yang,et al. BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing , 2015, ICS.
[18] James Demmel,et al. FRPA: A Framework for Recursive Parallel Algorithms , 2015 .
[19] Bo Kågström,et al. Management of Deep Memory Hierarchies - Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Computations , 2004, PARA.
[20] David E. Keyes,et al. Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU , 2012, Euro-Par Workshops.