The BiConjugate gradient method on GPUs

In a wide variety of applications from different scientific and engineering fields, the solution of complex and/or nonsymmetric linear systems of equations is required. To solve this kind of linear systems the BiConjugate Gradient method (BCG) is especially relevant. Nevertheless, BCG has a enormous computational cost. GPU computing is useful for accelerating this kind of algorithms but it is necessary to develop suitable implementations to optimally exploit the GPU architecture. In this paper, we show how BCG can be effectively accelerated when all operations are computed on a GPU. So, BCG has been implemented with two alternative routines of the Sparse Matrix Vector product (SpMV): the CUSPARSE library and the ELLR-T routine. Although our interest is focused on complex matrices, our implementation has been evaluated on a GPU for two sets of test matrices: complex and real, in single and double precision data. Experimental results show that BCG based on ELLR-T routine achieves the best performance, particularly for the set of complex test matrices. Consequently, this method can be useful as a tool to efficiently solve large linear system of equations (complex and/or nonsymmetric) involved in a broad range of applications.

[1]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[2]  Ester M. Garzón,et al.  Improving the Performance of the Sparse Matrix Vector Product with GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[3]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[4]  Francisco Vázquez,et al.  A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..

[5]  Danilo De Donno,et al.  Iterative Solution of Linear Systems in Electromagnetics (And Not Only): Experiences with CUDA , 2010, Euro-Par Workshops.

[6]  Francisco Vázquez,et al.  Matrix Implementation of Simultaneous Iterative Reconstruction Technique (SIRT) on GPUs , 2011, Comput. J..

[7]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[8]  Francisco Vázquez,et al.  Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach , 2012, Parallel Comput..

[9]  Jesper Larsson Träff,et al.  Euro-Par 2010 Parallel Processing Workshops - HeteroPar, HPCC, HiBB, CoreGrid, UCHPC, HPCF, PROPER, CCPI, VHPC, Ischia, Italy, August 31-September 3, 2010, Revised Selected Papers , 2011, Euro-Par Workshops.

[10]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs , 2009 .

[11]  W VuducRichard,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010 .

[12]  Rob H. Bisseling,et al.  Parallel Scientific Computation , 2004 .

[13]  Norberto Garcia Parallel power flow solutions using a biconjugate gradient algorithm and a Newton method: A GPU-based approach , 2010, IEEE PES General Meeting.

[14]  Rob H. Bisseling,et al.  Parallel scientific computation - a structured approach using BSP and MPI , 2004 .

[15]  C. Lanczos Solution of Systems of Linear Equations by Minimized Iterations1 , 1952 .

[16]  Julia Lobera,et al.  Optical diffraction tomography in fluid velocimetry : the use of a priori information , 2008 .

[17]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[18]  Ioane Muni Toke,et al.  Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.