BiELL: A bisection ELLPACK-based storage format for optimizing SpMV on GPUs

Abstract Sparse matrix–vector multiplication (SpMV) is one of the most important high level operations for basic linear algebra. Nowadays, the GPU has evolved into a highly parallel coprocessor which is suited to compute-intensive, highly parallel computation. Achieving high performance of SpMV on GPUs is relatively challenging, especially when the matrix has no specific structure. For these general sparse matrices, a new data structure based on the bisection ELLPACK format, BiELL, is designed to realize the load balance better, and thus improve the performance of the SpMV. Besides, based on the same idea of JAD format, the BiJAD format can be obtained. Experimental results on various matrices show that the BiELL and BiJAD formats perform better than other similar formats, especially when the number of non-zero elements per row varies a lot.

[1]  Francisco Vázquez,et al.  A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..

[2]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[3]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[4]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Satoshi Matsuoka,et al.  Fast Conjugate Gradients with Multiple GPUs , 2009, ICCS.

[7]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[8]  Francisco Vázquez,et al.  Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach , 2012, Parallel Comput..

[9]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[10]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[11]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[12]  Atsushi Suzuki,et al.  New Row-grouped CSR format for storing the sparse matrices on GPU with implementation in CUDA , 2010, ArXiv.

[13]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[14]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[15]  Yousef Saad,et al.  GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.