Three storage formats for sparse matrices on GPGPUs

The multiplication of a sparse matrix by a dense vector is a centerpiece of scientific computing applications: it is the essential kernel for the solution of sparse linear systems and sparse eigenvalue problems by iterative methods. The efficient implementation of the sparse matrix-vector multiplication is therefore crucial and has been the subject of an immense amount of research, with interest renewed with every major new trend in high performance computing architectures. The introduction of General Purpose Graphics Programming Units (GPGPUs) is no exception, and many articles have been devoted to this problem. In this report we propose three novel matrix formats, ELL-G and HLL which derive from ELL, and HDIA for matrices having mostly a diagonal sparsity pattern. We compare the performance of the proposed formats to that of state-of-the-art formats (i.e., HYB and ELL-RT) with experiments run on different GPU platforms and test matrices coming from various application domains.

[1]  C. Kelley Iterative Methods for Linear and Nonlinear Equations , 1987 .

[2]  Sabine Fenstermacher,et al.  Numerical Approximation Of Partial Differential Equations , 2016 .

[3]  Wen-mei W. Hwu,et al.  GPU Computing Gems Emerald Edition , 2011 .

[4]  Chia-Jung Hsu Numerical Heat Transfer and Fluid Flow , 1981 .

[5]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[6]  Alfredo Buttari,et al.  Object-Oriented Techniques for Sparse Matrix Computations in Fortran 2003 , 2012, TOMS.

[7]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[8]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Naga K. Govindaraju,et al.  GPGPU: general-purpose computation on graphics hardware , 2006, SC.

[10]  Gerhard Wellein,et al.  Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[11]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[12]  Francisco F. Rivera,et al.  Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs , 2012, Microprocess. Microsystems.

[13]  Michal Mrozowski,et al.  A Memory Efficient and Fast Sparse Matrix Vector Product on a GPU , 2011 .

[14]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[15]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[16]  Anne Greenbaum,et al.  Iterative methods for solving linear systems , 1997, Frontiers in applied mathematics.

[17]  Daisuke Takahashi,et al.  Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs , 2013, ICCSA.

[18]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[19]  Francisco Vázquez,et al.  Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach , 2012, Parallel Comput..

[20]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[21]  Iain S. Duff,et al.  Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface , 1997, TOMS.

[22]  Randall J. LeVeque,et al.  Finite difference methods for ordinary and partial differential equations - steady-state and time-dependent problems , 2007 .

[23]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[24]  Timothy A. Davis,et al.  Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method , 2004, TOMS.

[25]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[26]  Bertil Schmidt,et al.  CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations , 2013, Parallel Comput..

[27]  Thomas C. Oppe,et al.  ITPACKV 2D user's guide , 1989 .

[28]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[29]  Cluster System Lab A segment-based sparse matrix–vector multiplication on CUDA , 2012 .

[30]  Davide Barbieri,et al.  Generalized GEMM Kernels on GPGPUs: Experiments and Applications , 2009, PARCO.

[31]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[32]  Valeria Cardellini,et al.  Design patterns for sparse-matrix computations on hybrid CPU/GPU platforms , 2014, Sci. Program..

[33]  Kiran Kumar Matam,et al.  Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU , 2011, 2011 International Conference on Parallel Processing.

[34]  Francisco Vázquez,et al.  A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..

[35]  Hai Jin,et al.  Optimization of Sparse Matrix-Vector Multiplication with Variant CSR on GPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[36]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.