论文信息 - The sparse matrix vector product on GPUs

The sparse matrix vector product on GPUs

The sparse matrix vector product (SpMV) is a paramount operation in engineering and scientific computing and, hence, has been a subject of intense research for long. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim of maximizing the performance. The Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors. Currently, SpMV implementations for NVIDIA-GPUs have already appeared on the scene. This work proposes and evaluates a new implementation of SpMV for GPUs based on a new matrix storage format, called ELLPACK-R, and compares it against a variety of formats proposed elsewhere. The most important qualities of this new format is that (1) no preprocessing of the sparse matrix is required, and (2) the resulting SpMV algorithm is very regular. The comparative evaluation of this new SpMV approach has been carried out based on a representative set of test matrices. The results show that the SpMV approach based on ELLPACK-R turns out to be superior to the previous strategies used so far. Moreover, a comparison with standard state-of-the-art superscalar processors reveals that significant speedup factors are achieved

Ester M. Garzón | F. Vázquez | J. A. Martínez | J. J. Fernandez

[1] Michael Garland,et al. Eﬃcient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[2] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[3] J. Xu. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems , 2009 .

[4] William Aiello,et al. Sparse Matrix Computations on Parallel Processor Arrays , 1993, SIAM J. Sci. Comput..

[5] Sivan Toledo,et al. Improving the memory-system performance of sparse-matrix vector multiplication , 1997, IBM J. Res. Dev..

[6] Rob H. Bisseling,et al. Parallel Scientific Computation , 2004 .

[7] Rajesh Bordawekar,et al. Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[8] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[9] Jack J. Dongarra,et al. Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..

[10] Eric A. Brewer,et al. ATLAS: an infrastructure for global computing , 1996, EW 7.

[11] Guillaume Caumon,et al. Concurrent number cruncher: a GPU implementation of a general sparse linear solver , 2009, Int. J. Parallel Emergent Distributed Syst..

[12] John M. Mellor-Crummey,et al. Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..