High-Performance Tensor Contractions for GPUs
暂无分享,去创建一个
Jack J. Dongarra | Stanimire Tomov | Azzam Haidar | Ian Masliah | Marc Baboulin | Ian Karlin | Ahmad Abdelfattah | Tzanio V. Kolev | Joël Falcou | Veselin Dobrev | Christopher W. Earl | J. Dongarra | A. Haidar | S. Tomov | T. Kolev | M. Baboulin | I. Karlin | V. Dobrev | A. Abdelfattah | J. Falcou | I. Masliah
[1] Jack J. Dongarra,et al. An Improved Magma Gemm For Fermi Graphics Processing Units , 2010, Int. J. High Perform. Comput. Appl..
[2] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[3] Jack J. Dongarra,et al. Batched matrix computations on hardware accelerators based on GPUs , 2015, Int. J. High Perform. Comput. Appl..
[4] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[5] Jack J. Dongarra,et al. LU Factorization of Small Matrices: Accelerating Batched DGETRF on the GPU , 2014, 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS).
[6] Chun Chen,et al. Speeding up Nek5000 with autotuning and specialization , 2010, ICS '10.
[7] Jack J. Dongarra,et al. High-Performance Matrix-Matrix Multiplications of Very Small Matrices , 2016, Euro-Par.
[8] Tamara G. Kolda,et al. Tensor Decompositions and Applications , 2009, SIAM Rev..
[9] Robert J. Harrison,et al. Model-Driven SIMD Code Generation for a Multi-resolution Tensor Kernel , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[10] Erwin Laure,et al. OpenACC acceleration of the Nek5000 spectral element code , 2015, Int. J. High Perform. Comput. Appl..
[11] Sriram Krishnamoorthy,et al. Acceleration of Streamed Tensor Contraction Expressions on GPGPU-Based Clusters , 2010, 2010 IEEE International Conference on Cluster Computing.
[12] Prasanna Balaprakash,et al. Generating Efficient Tensor Contractions for GPUs , 2015, 2015 44th International Conference on Parallel Processing.
[13] John F. Stanton,et al. A massively parallel tensor contraction framework for coupled-cluster computations , 2014, J. Parallel Distributed Comput..
[14] Jack J. Dongarra,et al. Performance, Design, and Autotuning of Batched GEMM for GPUs , 2016, ISC.
[15] Jack J. Dongarra,et al. A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[16] Michael W. Mahoney,et al. Future Directions in Tensor-Based Computation and Modeling , 2009 .
[17] Wen-mei W. Hwu,et al. GPU Computing Gems Jade Edition , 2011 .
[18] Tzanio V. Kolev,et al. High-Order Curvilinear Finite Element Methods for Lagrangian Hydrodynamics , 2012, SIAM J. Sci. Comput..
[19] Jack J. Dongarra,et al. A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations , 2015, ISC.
[20] Jack Dongarra,et al. Towards a High-Performance Tensor Algebra Package for Accelerators , 2015 .