AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods

The solution of large sparse linear systems arises in many applications, such as computational fluid dynamics and oil reservoir simulation. In realistic cases the matrices are often so large that they require large scale distributed parallel computing to obtain the solution of interest in a reasonable time. In this paper we discuss the design and implementation of the AmgX library, which provides drop-in GPU acceleration of distributed algebraic multigrid (AMG) and preconditioned iterative methods. The AmgX library implements both classical and aggregation-based AMG methods with different selector and interpolation strategies, along with a variety of smoothers and preconditioners, including block-Jacobi, Gauss--Seidel, and incomplete-LU factorization. The library contains many of the standard and flexible preconditioned Krylov subspace iterative methods, which can be combined with any of the available multigrid methods or simpler preconditioners. The parallelism in the aggregation scheme exploits parallel...

[1]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.

[2]  M. Sipser,et al.  Maximum matching in sparse random graphs , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[5]  A. Brandt Algebraic multigrid theory: The symmetric case , 1986 .

[6]  I. Duff,et al.  The effect of ordering on preconditioned conjugate gradients , 1989 .

[7]  H. Elman,et al.  Ordering techniques for the preconditioned conjugate gradient method on parallel computers , 1989 .

[8]  Jõ Ao,et al.  New Experimental Results for Bipartite Matching New Experimental Results for Bipartite Matching , 1992 .

[9]  Yousef Saad,et al.  A Flexible Inner-Outer Preconditioned GMRES Algorithm , 1993, SIAM J. Sci. Comput..

[10]  L. Dutto The effect of ordering on preconditioned GMRES algorithm, for solving the compressible Navier-Stokes equations , 1993 .

[11]  Mark T. Jones,et al.  A Parallel Graph Coloring Heuristic , 1993, SIAM J. Sci. Comput..

[12]  Tommy R. Jensen,et al.  Graph Coloring Problems , 1994 .

[13]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[14]  Richard J. Anderson,et al.  A Parallel Implementation of the Push-Relabel Algorithm for the Maximum Flow Problem , 1995, J. Parallel Distributed Comput..

[15]  Kivanc Dincer,et al.  A Comparison of Parallel Graph Coloring Algorithms , 1995 .

[16]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[17]  E. Botta,et al.  Renumbering strategies based on multilevel techniques combined with ILU decompositions , 1997 .

[18]  C. Phillips,et al.  Independent Columns: A New Parallel ILU Preconditioner for the PCG Method , 1997, Parallel Comput..

[19]  M. Adams,et al.  A Parallel Maximal Independent Set Algorithm , 1998 .

[20]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[21]  E. F. F. Botta,et al.  Matrix Renumbering ILU: An Effective Algebraic Multilevel ILU Preconditioner for Sparse Matrices , 1999, SIAM J. Matrix Anal. Appl..

[22]  Tomás F. Pena,et al.  Parallel Incomplete LU Factorization as a Preconditioner for Krylov Subspace Methods , 1999, Parallel Process. Lett..

[23]  Michele Benzi,et al.  Orderings for Incomplete Factorization Preconditioning of Nonsymmetric Problems , 1999, SIAM J. Sci. Comput..

[24]  A. Gebremedhin Parallel Graph Coloring , 1999 .

[25]  Y. Notay Flexible Conjugate Gradients , 2000, SIAM J. Sci. Comput..

[26]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[27]  Ray S. Tuminaro,et al.  Parallel Smoothed Aggregation Multigrid : Aggregation Strategies on Massively Parallel Machines , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[28]  K. Stuben,et al.  Algebraic Multigrid (AMG) : An Introduction With Applications , 2000 .

[29]  Marian Brezina,et al.  Convergence of algebraic multigrid based on smoothed aggregation , 1998, Numerische Mathematik.

[30]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[31]  Valeria Simoncini,et al.  Flexible Inner-Outer Krylov Subspace Methods , 2002, SIAM J. Numer. Anal..

[32]  Ludmil T. Zikatanov,et al.  A multigrid method based on graph matching for convection–diffusion equations , 2003, Numer. Linear Algebra Appl..

[33]  Jonathan J. Hu,et al.  Parallel multigrid smoothing: polynomial versus Gauss--Seidel , 2003 .

[34]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[35]  Marian Brezina,et al.  Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems , 2005, Computing.

[36]  An Introduction to Algebraic Multigrid , 2006 .

[37]  Hans De Sterck,et al.  Reducing Complexity in Parallel Algebraic Multigrid Preconditioners , 2004, SIAM J. Matrix Anal. Appl..

[38]  U. Yang,et al.  Distance-two interpolation for parallel algebraic multigrid , 2007 .

[39]  Judith A. Vogel,et al.  Flexible BiCG and flexible Bi-CGSTAB for nonsymmetric linear systems , 2007, Appl. Math. Comput..

[40]  Yvan Notay,et al.  Analysis of Aggregation-Based Multigrid , 2008, SIAM J. Sci. Comput..

[41]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[42]  Manfred Liebmann,et al.  A Parallel Algebraic Multigrid Solver on Graphics Processing Units , 2009, HPCA.

[43]  Ulrike Meier Yang,et al.  On long‐range interpolation operators for aggressive coarsening , 2009, Numer. Linear Algebra Appl..

[44]  Y. Notay An aggregation-based algebraic multigrid method , 2010 .

[45]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[46]  Jiri Kraus,et al.  Efficient AMG on Heterogeneous Systems , 2011, Facing the Multicore-Challenge.

[47]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[48]  M. Naumov Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU , 2012 .

[49]  Maxim Naumov,et al.  Preconditioned Block‐Iterative Methods on GPUs , 2012 .

[50]  Luke N. Olson,et al.  Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods , 2012, SIAM J. Sci. Comput..

[51]  John R. Gilbert,et al.  Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..

[52]  Xiaozhe Hu,et al.  Parallel Unsmoothed Aggregation Algebraic Multigrid Algorithms on GPUs , 2013 .

[53]  Sivasankaran Rajamanickam,et al.  Parallel Graph Coloring. , 2015 .

[54]  Jonathan M. Cohen,et al.  Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU , 2015 .