Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC

The aim of this paper is to show that well known SPARSKIT SpMV routines for Ellpack-Itpack and Jagged Diagonal formats can be easily and successfully adapted to a hybrid GPU-accelerated computer environment using OpenACC. We formulate general guidelines for simple steps that should be done to transform source codes with irregular data access into efficient OpenACC programs. We also advise how to improve the performance of such programs by tuning data structures to utilize hardware properties of GPUs. Numerical experiments show that our accelerated versions of SPARSKIT SpMV routines achieve the performance comparable with the performance of the corresponding CUSPARSE routines optimized by NVIDIA.

[1]  Hermann J. Eberl,et al.  OpenACC Parallelisation for Diffusion Problems, Applied to Temperature Distribution on a Honeycomb Around the Bee Brood: A Worked Example Using BiCGSTAB , 2013, PPAM.

[2]  Hiroshi Okuda,et al.  Effect of GPU Communication-Hiding for SPMV Using OpenACC , 2014 .

[3]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[4]  Gerhard Wellein,et al.  Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[5]  Ester M. Garzón,et al.  Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[6]  Kiran Kumar Matam,et al.  Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU , 2011, 2011 International Conference on Parallel Processing.

[7]  Michael Wolfe,et al.  Implementing the PGI Accelerator model , 2010, GPGPU-3.

[8]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[9]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[10]  Francisco de Sande,et al.  A preliminary evaluation of OpenACC implementations , 2012, The Journal of Supercomputing.

[11]  Ami Marowka Parallel computing on any desktop , 2007, CACM.

[12]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[13]  Yousef Saad,et al.  GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.

[14]  Cheng Wang,et al.  A Validation Testsuite for OpenACC 1.0 , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[15]  Tomás F. Pena,et al.  Using sampled information: is it enough for the sparse matrix–vector product locality optimization? , 2014, Concurr. Comput. Pract. Exp..

[16]  Hai Jin,et al.  A segment‐based sparse matrix–vector multiplication on CUDA , 2014, Concurr. Comput. Pract. Exp..

[17]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[18]  Beata Bylina,et al.  Performance analysis of multicore and multinodal implementation of SpMV operation , 2014, 2014 Federated Conference on Computer Science and Information Systems.

[19]  Jonas Koko,et al.  Parallel preconditioned conjugate gradient algorithm on GPU , 2012, J. Comput. Appl. Math..

[20]  Ken A. Hawick,et al.  Exploiting graphical processing units for data-parallel scientific applications , 2009 .

[21]  Janusz S. Kowalik,et al.  Using OpenCL - Programming Massively Parallel Computers , 2012, Advances in Parallel Computing.

[22]  Sunita Chandrasekaran,et al.  Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[23]  Putt Sakdhnagool,et al.  Evaluating Performance Portability of OpenACC , 2014, LCPC.

[24]  Jack J. Dongarra,et al.  Accelerating GPU Kernels for Dense Linear Algebra , 2010, VECPAR.

[25]  Richard F. Barrett,et al.  Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.

[26]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .