Auto-Tuning CUDA Parameters for Sparse Matrix-Vector Multiplication on GPUs

Graphics Processing Unit (GPU) has become an attractive coprocessor for scientific computing due to its massive processing capability. The sparse matrix-vector multiplication (SpMV) is a critical operation in a wide variety of scientific and engineering applications, such as sparse linear algebra and image processing. This paper presents an auto-tuning framework that can automatically compute and select CUDA parameters for SpMV to obtain the optimal performance on specific GPUs. The framework is evaluated on two NVIDIA GPU platforms, GeForce 9500 GTX and GeForce GTX 295.

[1]  Eitan Grinspun,et al.  Sparse matrix solvers on the GPU: conjugate gradients and multigrid , 2003, SIGGRAPH Courses.

[2]  Victor Eijkhout,et al.  Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.

[3]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[4]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .

[5]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[6]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[7]  Jack J. Dongarra,et al.  Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..

[8]  Satoshi Matsuoka,et al.  Auto-tuning 3-D FFT library for CUDA GPUs , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[10]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.