A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs
暂无分享,去创建一个
P. Sadayappan | Arash Ashari | Naser Sedaghati | John Eisenlohr | P. Sadayappan | Arash Ashari | N. Sedaghati | John Eisenlohr
[1] Jonathan D. Hogg. A Fast Dense Triangular Solve in CUDA , 2013, SIAM J. Sci. Comput..
[2] Eurípides Montagne,et al. An Alternative Compressed Storage Format for Sparse Matrices , 2003, ISCIS.
[3] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.
[4] James Demmel,et al. Fast Reproducible Floating-Point Summation , 2013, 2013 IEEE 21st Symposium on Computer Arithmetic.
[5] Shengen Yan,et al. yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.
[6] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .
[7] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[8] I. Reguly,et al. Efficient sparse matrix-vector multiplication on cache-based GPUs , 2012, 2012 Innovative Parallel Computing (InPar).
[9] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Srinivasan Parthasarathy,et al. Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..
[11] Yao Zhang,et al. Scan primitives for GPU computing , 2007, GH '07.
[12] P. Sadayappan,et al. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs , 2014, ICS '14.
[13] Richard W. Vuduc,et al. Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.
[14] Youcef Saad,et al. A Basic Tool Kit for Sparse Matrix Computations , 1990 .
[15] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..
[16] Y. Saad,et al. Krylov Subspace Methods on Supercomputers , 1989 .
[17] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[18] Michael Garland,et al. Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .