Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format
暂无分享,去创建一个
[1] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .
[2] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[3] John R. Gilbert,et al. High-Performance Graph Algorithms from Parallel Sparse Matrices , 2006, PARA.
[4] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .
[5] Iain S. Duff,et al. An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum , 2002, TOMS.
[6] Lukasz Miroslaw,et al. Compressed Multiple-Row Storage Format , 2012, ArXiv.
[7] Kurt Keutzer,et al. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.
[8] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[9] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[10] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[11] Murat Efe Guney,et al. On the limits of GPU acceleration , 2010 .
[12] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[13] Shengen Yan,et al. StreamScan: fast scan algorithms for GPUs without global barrier synchronization , 2013, PPoPP '13.
[14] L. Trefethen,et al. Numerical linear algebra , 1997 .
[15] Lukasz Miroslaw,et al. Compressed Multirow Storage Format for Sparse Matrices on Graphics Processing Units , 2012, SIAM J. Sci. Comput..
[16] Karl Rupp,et al. ViennaCL-A High Level Linear Algebra Library for GPUs and Multi-Core CPUs , 2010 .
[17] D. Keyes,et al. Toward Realistic Performance Bounds for Implicit CFD , 1999 .
[18] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[19] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[20] Eric S. Chung,et al. SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place , 2012 .
[21] I. Reguly,et al. Efficient sparse matrix-vector multiplication on cache-based GPUs , 2012, 2012 Innovative Parallel Computing (InPar).
[22] Michael Garland,et al. Sparse matrix computations on manycore GPU’s , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[23] Arutyun Avetisyan,et al. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.
[24] Eun-Jin Im,et al. Optimization of Sparse Matrix Kernels for Data Mining , 2007 .
[25] Atsushi Suzuki,et al. New Row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA , 2010, ArXiv.
[26] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[27] Gerhard Wellein,et al. A unified sparse matrix data format for modern processors with wide SIMD units , 2013, ArXiv.
[28] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .
[29] Francisco Vázquez,et al. A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..
[30] Hai Jin,et al. Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.
[31] William Gropp,et al. Adaptive thread distributions for SpMV on a GPU , 2012 .
[32] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..