s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid

Geometric multigrid solvers within adaptive mesh refinement (AMR) applications often reach a point where further coarsening of the grid becomes impractical as individual sub domain sizes approach unity. At this point the most common solution is to use a bottom solver, such as BiCGStab, to reduce the residual by a fixed factor at the coarsest level. Each iteration of BiCGStab requires multiple global reductions (MPI collectives). As the number of BiCGStab iterations required for convergence grows with problem size, and the time for each collective operation increases with machine scale, bottom solves in large-scale applications can constitute a significant fraction of the overall multigrid solve time. In this paper, we implement, evaluate, and optimize a communication-avoiding s-step formulation of BiCGStab (CABiCGStab for short) as a high-performance, distributed-memory bottom solver for geometric multigrid solvers. This is the first time s-step Krylov subspace methods have been leveraged to improve multigrid bottom solver performance. We use a synthetic benchmark for detailed analysis and integrate the best implementation into BoxLib in order to evaluate the benefit of a s-step Krylov subspace method on the multigrid solves found in the applications LMC and Nyx on up to 32,768 cores on the Cray XE6 at NERSC. Overall, we see bottom solver improvements of up to 4.2x on synthetic problems and up to 2.7x in real applications. This results in as much as a 1.5x improvement in solver performance in real applications.

[1]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[2]  H. Walker Implementation of the GMRES method using householder transformations , 1988 .

[3]  Wim Vanroose,et al.  Improving the arithmetic intensity of multigrid with the help of polynomial smoothers , 2012, Numer. Linear Algebra Appl..

[4]  D. Hut A Newton Basis Gmres Implementation , 1991 .

[5]  W. Joubert,et al.  Parallelizable restarted iterative methods for nonsymmetric linear systems. part I: Theory , 1992 .

[6]  John Van Rosendale Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.

[7]  Jean M. Sexton,et al.  Nyx: A MASSIVELY PARALLEL AMR CODE FOR COMPUTATIONAL COSMOLOGY , 2013, J. Open Source Softw..

[8]  Laurence T. Yang Solving sparse least squares problems on massively distributed memory computers , 1997, Proceedings. Advances in Parallel and Distributed Computing.

[9]  H. V. D. Vorst,et al.  Reducing the effect of global communication in GMRES( m ) and CG on parallel distributed memory computers , 1995 .

[10]  J. Demmel,et al.  Avoiding Communication in Computing Krylov Subspaces , 2007 .

[11]  Wim Vanroose,et al.  Hiding global synchronization latency in the preconditioned Conjugate Gradient algorithm , 2014, Parallel Comput..

[12]  Andrés Tomás,et al.  Parallel Arnoldi eigensolvers with enhanced scalability via global communications rearrangement , 2007, Parallel Comput..

[13]  Samuel Williams,et al.  Optimization of geometric multigrid for emerging multi- and manycore processors , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Vicente Hernández,et al.  SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems , 2005, TOMS.

[15]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[16]  Lothar Reichel,et al.  On the generation of Krylov subspace bases , 2012 .

[17]  Sivan Toledo,et al.  Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers , 1997, J. Comput. Syst. Sci..

[18]  James Demmel,et al.  Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods , 2013, SIAM J. Sci. Comput..

[19]  Wim Vanroose,et al.  The Impact of Global Communication Latency at Extreme Scales on Krylov Methods , 2012, ICA3PP.

[20]  Jocelyne Erhel,et al.  A parallel GMRES version for general sparse matrices. , 1995 .

[21]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[22]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[23]  M S Day,et al.  Numerical simulation of laminar reacting flows with complex chemistry , 2000 .