An efficient GPU version of the preconditioned GMRES method

In a large number of scientific applications, the solution of sparse linear systems is the stage that concentrates most of the computational effort. This situation has motivated the study and development of several iterative solvers, among which preconditioned Krylov subspace methods occupy a place of privilege. In a previous effort, we developed a GPU-aware version of the GMRES method included in ILUPACK, a package of solvers distinguished by its inverse-based multilevel ILU preconditioner. In this work, we study the performance of our previous proposal and integrate several enhancements in order to mitigate its principal bottlenecks. The numerical evaluation shows that our novel proposal can reach important run-time reductions.

[1]  Enrique S. Quintana-Ortí,et al.  A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems , 2016, Euro-Par Workshops.

[2]  Olaf Schenk,et al.  Inertia-Revealing Preconditioning For Large-Scale Nonconvex Constrained Optimization , 2008, SIAM J. Sci. Comput..

[3]  Brian Vinter,et al.  Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides , 2017, Concurr. Comput. Pract. Exp..

[4]  Pablo Ezzatti,et al.  Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[5]  Yousef Saad,et al.  Multilevel Preconditioners Constructed From Inverse-Based ILUs , 2005, SIAM J. Sci. Comput..

[6]  Enrique S. Quintana-Ortí,et al.  Exploiting thread-level parallelism in the iterative solution of sparse linear systems , 2011, Parallel Comput..

[7]  Victor Eijkhout,et al.  LAPACK Working Note 50: Distributed Sparse Data Structures for Linear Algebra Operations , 1992 .

[8]  Enrique S. Quintana-Ortí,et al.  Parallelization of Multilevel ILU Preconditioners on Distributed-Memory Multiprocessors , 2010, PARA.

[9]  Pablo Ezzatti,et al.  A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[10]  Enrique S. Quintana-Ortí,et al.  Exploiting task and data parallelism in ILUPACK's preconditioned CG solver on NUMA architectures and many-core accelerators , 2016, Parallel Comput..

[11]  Marcus J. Grote,et al.  Algebraic Multilevel Preconditioner for the Helmholtz Equation in Heterogeneous Media , 2009, SIAM J. Sci. Comput..

[12]  Sheldon X.-D. Tan,et al.  Parallel GMRES solver for fast analysis of large linear dynamic systems on GPU platforms , 2016, Integr..

[13]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .