Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

It is well-known that reordering techniques applied to sparse matrices are common strategies to improve the performance of sparse matrix operations, and particularly, the sparse matrix vector multiplication (SpMV) on CPUs. In this paper, we have evaluated some of the most successful reordering techniques on two different GPUs. In addition, in our study a number of sparse matrix storage formats were considered. Executions for both single and double precision arithmetics were also performed. We have found that SpMV is very sensitive to the application of reordering techniques on GPUs. In particular, several characteristics of the reordered matrices that have a big impact on the SpMV performance have been detected. In most of the cases, reordered matrices outperform the original ones, showing noticeable speedups up to 2.6x. We have also observed that there is no one storage format preferred over the others.

[1]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[2]  Francisco F. Rivera,et al.  Performance optimization of irregular codes based on the combination of reordering and blocking techniques , 2005, Parallel Comput..

[3]  Larry Carter,et al.  Sparse Tiling for Stationary Iterative Methods , 2004, Int. J. High Perform. Comput. Appl..

[4]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[5]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[6]  Emilio L. Zapata,et al.  Memory Hierarchy Performance Prediction for Blocked Sparse Algorithms , 1999, Parallel Process. Lett..

[7]  Victor Eijkhout,et al.  Performance Optimization and Modeling of Blocked Sparse Kernels , 2007, Int. J. High Perform. Comput. Appl..

[8]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[10]  Jesús Carretero,et al.  Reordering Algorithms for Increasing Locality on Multicore Processors , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.

[11]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[12]  Elizabeth Cuthill,et al.  Several Strategies for Reducing the Bandwidth of Matrices , 1972 .

[13]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[14]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[15]  Patrick R. Amestoy,et al.  An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..

[16]  Josep-Lluís Larriba-Pey,et al.  Block algorithms for sparse matrix computations on high performance workstations , 1996, ICS '96.

[17]  P. Sadayappan,et al.  On improving the performance of sparse matrix-vector multiplication , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[18]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[19]  Leonid Oliker,et al.  Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations , 2013, SIAM Rev..

[20]  Sivan Toledo,et al.  Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..

[21]  Nectarios Koziris,et al.  A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures , 2009, 2009 International Conference on Computational Science and Engineering.

[22]  Alvaro L. G. A. Coutinho,et al.  Performance comparison of data‐reordering algorithms for sparse matrix–vector multiplication in edge‐based unstructured grid computations , 2006 .

[23]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[24]  Calvin J. Ribbens,et al.  Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.