MAGMA templates for scalable linear algebra on emerging architectures
暂无分享,去创建一个
Stanimire Tomov | Azzam Haidar | Mark Gates | Ahmad Abdelfattah | Jack Dongarra | Mohammed Al Farhan | Dalal Sukkari | Robert Rosenberg | J. Dongarra | A. Haidar | S. Tomov | M. Gates | R. Rosenberg | D. Sukkari | A. Abdelfattah | Mohammed A. Al Farhan
[1] Daniel Sunderland,et al. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..
[2] Jack Dongarra,et al. MAGMA-sparse Interface Design Whitepaper , 2017 .
[3] Brian Vinter,et al. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.
[4] Nicholas J. Higham,et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Jack Dongarra,et al. C++ API for BLAS and LAPACK , 2017 .
[6] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[7] Jack Dongarra,et al. Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems , 2015, Supercomput. Front. Innov..
[8] Yousef Saad,et al. GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.
[9] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[10] Timothy C. Warburton,et al. OCCA: A unified approach to multi-threading languages , 2014, ArXiv.
[11] Jack J. Dongarra,et al. SLATE: design of a modern distributed and accelerated linear algebra library , 2019, SC.
[12] David E. Keyes,et al. Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering , 2018, SIAM J. Sci. Comput..
[13] David E. Keyes,et al. Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures , 2018, IEEE Transactions on Parallel and Distributed Systems.
[14] William Gropp,et al. A hybrid format for better performance of sparse matrix-vector multiplication on a GPU , 2016, Int. J. High Perform. Comput. Appl..
[15] Jack J. Dongarra,et al. Massively Parallel Automated Software Tuning , 2019, ICPP.
[16] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[17] J. Dongarra,et al. Implementing a Sparse Matrix Vector Product for the SELL-C / SELL-C-σ formats on NVIDIA GPUs , 2014 .
[18] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[19] David E. Keyes,et al. Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture , 2017, Euro-Par.
[20] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[21] Jack Dongarra,et al. Designing SLATE: Software for Linear Algebra Targeting Exascale , 2017 .
[22] David E. Keyes,et al. Unstructured computational aerodynamics on many integrated core architecture , 2014, Parallel Comput..
[23] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[24] Jack Dongarra,et al. Roadmap for the Development of a Linear Algebra Library for Exascale Computing: SLATE: Software for Linear Algebra Targeting Exascale , 2017 .
[25] Jack J. Dongarra,et al. Investigating half precision arithmetic to accelerate dense linear system solvers , 2017, ScalA@SC.
[26] Jack Dongarra,et al. Least squares solvers for distributed-memory machines with GPU accelerators , 2019, ICS.
[27] Mohammed Al Farhan,et al. Unstructured Computations on Emerging Architectures , 2019 .
[28] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .