Solving finite difference linear systems on GPUs: CUDA based Parallel Explicit Preconditioned Biconjugate Conjugate Gradient type Methods

During the last decades, explicit approximate inverse preconditioning methods have been used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. A new parallel computational technique is proposed for the parallelization of the explicit preconditioned conjugate gradient type method on a Graphics Processing Unit (GPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. The inherently parallel operations between vectors and matrices involved in the explicit preconditioned biconjugate conjugate gradient type schemes exhibit significant amounts of loop-level parallelism because of the matrix–vector and the vector–vector products that can lead to high performance gain on the GPU systems, specifically designed for such computations. Finally, numerical results for the performance of the explicit preconditioned biconjugate conjugate gradient type method for solving characteristic two dimensional boundary value problems, using the finite difference method, on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed method are also discussed.

[1]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[2]  Jae Heon Yun,et al.  PARALLEL IMPLEMENTATION OF HYBRID ITERATIVE METHODS FOR NONSYMMETRIC LINEAR SYSTEMS , 1997 .

[3]  Lee Margetts,et al.  The convergence variability of parallel iterative solvers , 2006 .

[4]  E. A. Lipitakis,et al.  On sparse lu factorization procedures for the solution of parabolic differential equations in three space dimensions , 1979 .

[5]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[6]  George A. Gravvanis,et al.  High performance finite element approximate inverse preconditioning , 2008, Appl. Math. Comput..

[7]  D. J. Evans,et al.  Explicit semi-direct methods based on approximate inverse matrix techniques for solving boundary-value problems on parallel processors , 1987 .

[8]  Y. Saad,et al.  Iterative solution of linear systems in the 20th century , 2000 .

[9]  Do Y. Kwak,et al.  Two-level additive Schwarz preconditioners for P1 nonconforming finite elements for nonsymmetric and indefinite problems , 1997 .

[10]  Marcus J. Grote,et al.  Parallel Preconditioning with Sparse Approximate Inverses , 1997, SIAM J. Sci. Comput..

[11]  George A. Gravvanis,et al.  A NOTE ON PARALLEL FINITE DIFFERENCE APPROXIMATE INVERSE PRECONDITIONING ON MULTICORE SYSTEMS USING POSIX THREADS , 2013 .

[12]  J. Krüger,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, ACM Trans. Graph..

[13]  K. M. Giannoutakis,et al.  Fast Parallel Finite Element Approximate Inverses , 2008 .

[14]  Michael Griebel,et al.  A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations , 2010, Computer Science - Research and Development.

[15]  George A. Gravvanis,et al.  Explicit approximate inverse preconditioning techniques , 2002 .

[16]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[17]  Guillaume Caumon,et al.  Concurrent number cruncher: a GPU implementation of a general sparse linear solver , 2009, Int. J. Parallel Emergent Distributed Syst..

[18]  George A. Gravvanis,et al.  High Performance Inverse Preconditioning , 2009 .