Block Iterative Methods and Recycling for Improved Scalability of Linear Solvers

Contemporary large-scale Partial Differential Equation (PDE) simulations usually require the solution of large and sparse linear systems. Moreover, it is often needed to solve these linear systems with different or multiple Right-Hand Sides (RHSs). In this paper, various strategies will be presented to extend the scalability of existing multigrid or domain decomposition linear solvers using appropriate recycling strategies or block methods—i.e., by treating multiple right-hand sides simultaneously. The scalability of this work is assessed by performing simulations on up to 8,192 cores for solving linear systems arising from various physical phenomena modeled by Poisson's equation, the system of linear elasticity, or Maxwell's equation. This work is shipped as part of on open-source software, readily available and usable in any C/C++, Python, or Fortran code. In particular, some simulations are performed on top of a well-established library, PETSc, and it is shown how our approaches can be used to decrease time to solution down by 30%.

[1]  Olaf Schenk,et al.  Solving unsymmetric sparse systems of linear equations with PARDISO , 2002, Future Gener. Comput. Syst..

[2]  Serge Gratton,et al.  A Flexible Generalized Conjugate Residual Method with Inner Orthogonalization and Deflated Restarting , 2011, SIAM J. Matrix Anal. Appl..

[3]  Tetsuya Sakurai,et al.  Application of block Krylov subspace algorithms to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD , 2009, Comput. Phys. Commun..

[4]  Martin J. Gander,et al.  Optimized Schwarz Methods for Maxwell's Equations , 2006, SIAM J. Sci. Comput..

[5]  Kesheng Wu,et al.  A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..

[6]  Allen C. Robinson,et al.  Toward an h-Independent Algebraic Multigrid Method for Maxwell's Equations , 2006, SIAM J. Sci. Comput..

[7]  Victorita Dolean,et al.  An introduction to domain decomposition methods - algorithms, theory, and parallel implementation , 2015 .

[8]  Elizabeth R. Jessup,et al.  A Technique for Accelerating the Convergence of Restarted GMRES , 2005, SIAM J. Matrix Anal. Appl..

[9]  Henri Calandra,et al.  Flexible Variants of Block Restarted GMRES Methods with Application to Geophysics , 2012, SIAM J. Sci. Comput..

[10]  P. Soudais Iterative solution of a 3-D scattering problem from arbitrary shaped multidielectric and multiconducting bodies , 1994 .

[11]  F. Rapetti High order edge elements on simplicial meshes , 2007 .

[12]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[13]  Xiao-Chuan Cai,et al.  A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems , 1999, SIAM J. Sci. Comput..

[14]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[15]  Luc Giraud,et al.  Flexible GMRES with Deflated Restarting , 2010, SIAM J. Sci. Comput..

[16]  R. Morgan Restarted block-GMRES with deflation of eigenvalues , 2005 .

[17]  Ulrike Meier Yang,et al.  Parallel Algebraic Multigrid Methods — High Performance Preconditioners , 2006 .

[18]  E. Sturler,et al.  Nested Krylov methods based on GCR , 1996 .

[19]  Peter Bastian,et al.  Generic implementation of finite element methods in the Distributed and Unified Numerics Environment (DUNE) , 2010, Kybernetika.

[20]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[21]  Frédéric Nataf,et al.  Scalable domain decomposition preconditioners for heterogeneous elliptic problems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[22]  A. Klawonn,et al.  Highly scalable parallel domain decomposition methods with an application to biomechanics , 2010 .

[23]  J. Nédélec Mixed finite elements in ℝ3 , 1980 .

[24]  Frédéric Nataf,et al.  The Best Interface Conditions for Domain Decomposition Methods : Absorbing Boundary Conditions , .

[25]  Anthony T. Chronopoulos s-Step Iterative Methods for (Non) Symmetric (In) Definite Linear Systems , 1989, PPSC.

[26]  Frédéric Hecht,et al.  New development in freefem++ , 2012, J. Num. Math..

[27]  V. Simoncini,et al.  Convergence properties of block GMRES and matrix polynomials , 1996 .

[28]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[29]  Jing Meng,et al.  A new deflated block GCROT(m, k) method for the solution of linear systems with multiple right-hand sides , 2014, J. Comput. Appl. Math..

[30]  Frédéric Nataf,et al.  High performance domain decomposition methods on massively parallel architectures with freefem++ , 2012, J. Num. Math..

[31]  Wim Vanroose,et al.  An Efficient Multigrid Calculation of the Far Field Map for Helmholtz and Schrödinger Equations , 2012, SIAM J. Sci. Comput..

[32]  Robert A. van de Geijn,et al.  High-performance implementation of the level-3 BLAS , 2008, TOMS.

[33]  M. Minion,et al.  Accurate projection methods for the incompressible Navier—Stokes equations , 2001 .

[34]  E. Sturler,et al.  A Block Iterative Solver for Complex Non-Hermitian Systems Applied to Large-Scale, Electronic-Structure Calculations , 2002 .

[35]  Emmanuel Agullo,et al.  Block GMRES Method with Inexact Breakdowns and Deflated Restarting , 2014, SIAM J. Matrix Anal. Appl..

[36]  D. O’Leary The block conjugate gradient algorithm and related methods , 1980 .

[37]  Constantine Bekas,et al.  An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth's mantle , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  François Pacull,et al.  Memory efficient hybrid algebraic solvers for linear systems arising from compressible flows , 2013 .

[39]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[40]  R. Morgan,et al.  Deflated GMRES for systems with multiple shifts and multiple right-hand sides☆ , 2007, 0707.0502.

[41]  K. Burrage,et al.  Restarted GMRES preconditioned by deflation , 1996 .

[42]  Jason E. Hicken,et al.  A Simplified and Flexible Variant of GCROT for Solving Nonsymmetric Linear Systems , 2010, SIAM J. Sci. Comput..

[43]  Henri Calandra,et al.  A Modified Block Flexible GMRES Method with Deflation at Each Iteration for the Solution of Non-Hermitian Linear Systems with Multiple Right-Hand Sides , 2013, SIAM J. Sci. Comput..

[44]  Jack J. Dongarra,et al.  Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs , 2015, SIAM J. Sci. Comput..

[45]  Eric de Sturler,et al.  Recycling Krylov Subspaces for Sequences of Linear Systems , 2006, SIAM J. Sci. Comput..

[46]  S. Gratton,et al.  Incremental spectral preconditioners for sequences of linear systems , 2007 .

[47]  M. Gutknecht BLOCK KRYLOV SPACE METHODS FOR LINEAR SYSTEMS WITH MULTIPLE RIGHT-HAND SIDES : AN , 2005 .

[48]  Pierre Gosselet,et al.  Total and selective reuse of Krylov subspaces for the resolution of sequences of nonlinear structural problems , 2013, ArXiv.

[49]  Sascha M. Schnepp,et al.  Pipelined, Flexible Krylov Subspace Methods , 2015, SIAM J. Sci. Comput..

[50]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[51]  Dianne P. O'Leary,et al.  A Multigrid Method Enhanced by Krylov Subspace Iteration for Discrete Helmholtz Equations , 2001, SIAM J. Sci. Comput..

[52]  E. Sturler,et al.  Large‐scale topology optimization using preconditioned Krylov subspace methods with recycling , 2007 .

[53]  Martin J. Gander,et al.  Optimized Schwarz Methods , 2006, SIAM J. Numer. Anal..

[54]  Sivasankaran Rajamanickam,et al.  Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems , 2012, Sci. Program..

[55]  Panayot S. Vassilevski,et al.  PARALLEL AUXILIARY SPACE AMG FOR H(curl) PROBLEMS , 2009 .

[56]  Eric de Sturler,et al.  Recycling Krylov subspaces for CFD applications and a new hybrid recycling solver , 2015, J. Comput. Phys..

[57]  Bernhard Seiser,et al.  Electromagnetic tomography for brain imaging: From virtual to human brain , 2014, 2014 IEEE Conference on Antenna Measurements & Applications (CAMA).

[58]  Gonçalo Pena,et al.  Feel++ : A computational framework for Galerkin Methods and Advanced Numerical Methods , 2012 .

[59]  Elizabeth R. Jessup,et al.  On Improving Linear Solver Performance: A Block Variant of GMRES , 2005, SIAM J. Sci. Comput..

[60]  Atsushi Suzuki,et al.  A dissection solver with kernel detection for symmetric finite element matrices on shared memory computers , 2014 .

[61]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[62]  Laurence Halpern,et al.  Absorbing boundaries and layers, domain decomposition methods : applications to large scale computers , 2001 .

[63]  Jan G. Korvink,et al.  Parametric model order reduction accelerated by subspace recycling , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[64]  M. Sadkane,et al.  Exact and inexact breakdowns in the block GMRES method , 2006 .

[65]  Said El Hajji,et al.  Solving large linear systems with multiple right-hand sides , 2017, 2017 International Conference on Engineering and Technology (ICET).

[66]  Francesca Rapetti,et al.  High-order finite elements in numerical electromagnetism: degrees of freedom and generators in duality , 2016, Numerical Algorithms.

[67]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[68]  Mark F. Adams,et al.  Ultrascalable Implicit Finite Element Analyses in Solid Mechanics with over a Half a Billion Degrees of Freedom , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[69]  Ronald B. Morgan,et al.  GMRES with Deflated Restarting , 2002, SIAM J. Sci. Comput..

[70]  Hari Sundar,et al.  Parallel geometric-algebraic multigrid on unstructured forests of octrees , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[71]  Jack J. Dongarra,et al.  Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product , 2015, SpringSim.

[72]  B. Vital Etude de quelques methodes de resolution de problemes lineaires de grande taille sur multiprocesseur , 1990 .

[73]  D. Boffi,et al.  Computational Models of Electromagnetic Resonators: Analysis of Edge Element Approximation , 1999 .