Automatic Selection of Sparse Matrix Representation on GPUs

Sparse matrix-vector multiplication (SpMV) is a core kernel in numerous applications, ranging from physics simulation and large-scale solvers to data analytics. Many GPU implementations of SpMV have been proposed, targeting several sparse representations and aiming at maximizing overall performance. No single sparse matrix representation is uniformly superior, and the best performing representation varies for sparse matrices with different sparsity patterns. In this paper, we study the inter-relation between GPU architecture, sparse matrix representation and the sparse dataset. We perform extensive characterization of pertinent sparsity features of around 700 sparse matrices, and their SpMV performance with a number of sparse representations implemented in the NVIDIA CUSP and cuSPARSE libraries. We then build a decision model using machine learning to automatically select the best representation to use for a given sparse matrix on a given target platform, based on the sparse matrix features. Experimental results on three GPUs demonstrate that the approach is very effective in selecting the best representation.

[1]  Eric S. Chung,et al.  SpMV: A Memory-Bound Application on the GPU Stuck Between a Rock and a Hard Place , 2012 .

[2]  I. Reguly,et al.  Efficient sparse matrix-vector multiplication on cache-based GPUs , 2012, 2012 Innovative Parallel Computing (InPar).

[3]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[4]  Eurípides Montagne,et al.  An Alternative Compressed Storage Format for Sparse Matrices , 2003, ISCIS.

[5]  Kurt Keutzer,et al.  clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.

[6]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[7]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[10]  P. Sadayappan,et al.  Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs , 2015 .

[11]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[12]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[13]  Shengen Yan,et al.  yaSpMV: yet another SpMV framework on GPUs , 2014, PPoPP.

[14]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[15]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[16]  P. Sadayappan,et al.  High-performance sparse matrix-vector multiplication on GPUs for structured grid computations , 2012, GPGPU-5.

[17]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..

[18]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  P. Sadayappan,et al.  An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs , 2014, ICS '14.

[20]  Brian Vinter,et al.  CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication , 2015, ICS.

[21]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[22]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .