Sparse computations on GPGPUs

Sparse matrix computations are ubiquitous in scientific computing; General-Purpose computing on Graphics Processing Units (GPGPU) is fast becoming a key component of high performance computing systems. It is therefore natural that a substantial amount of effort has been devoted to implementing sparse matrix computations on GPUs. In this paper, we discuss our work in this field, starting with the data structures we have employed to implement common operations, together with the software architecture we have devised to allow interoperability with existing software packages. To test the effectiveness of our approach we have run experiments with it on two platforms; the experimental results show that our data structures allow us to achieve very good performance results, significantly better than what can be obtained with the most recent version of the CUSPARSE library.

[1]  Arutyun Avetisyan,et al.  Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures , 2010, HiPEAC.

[2]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[3]  Francisco F. Rivera,et al.  Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs , 2012, Microprocess. Microsystems.

[4]  W VuducRichard,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010 .

[5]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[6]  Ester M. Garzón,et al.  Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[7]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  Wen-mei W. Hwu,et al.  GPU Computing Gems Emerald Edition , 2011 .

[9]  Daniela di Serafino,et al.  MLD2P4: A Package of Parallel Algebraic Multilevel Domain Decomposition Preconditioners in Fortran 95 , 2010, TOMS.

[10]  Michele Colajanni,et al.  PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.

[11]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[12]  Iain S. Duff,et al.  Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface , 1997, TOMS.

[13]  Jens H. Krüger,et al.  GPGPU: general purpose computation on graphics hardware , 2004, SIGGRAPH '04.

[14]  Murat Efe Guney,et al.  On the limits of GPU acceleration , 2010 .

[15]  Davide Barbieri,et al.  Design Patterns for Scientific Computations on Sparse Matrices , 2011, Euro-Par Workshops.

[16]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[17]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).