Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients

Abstract We present a robust and scalable preconditioner for the solution of large-scale linear systems that arise from the discretization of elliptic PDEs amenable to rank compression. The preconditioner is based on hierarchical low-rank approximations and the cyclic reduction method. The setup and application phases of the preconditioner achieve log-linear complexity in memory footprint and number of operations, and numerical experiments exhibit good weak and strong scalability at large processor counts in a distributed memory environment. Numerical experiments with linear systems that feature symmetry and nonsymmetry, definiteness and indefiniteness, constant and variable coefficients demonstrate the preconditioner applicability and robustness. Furthermore, it is possible to control the number of iterations via the accuracy threshold of the hierarchical matrix approximations and their arithmetic operations, and the tuning of the admissibility condition parameter. Together, these parameters allow for optimization of the memory requirements and performance of the preconditioner.

[1]  ARNOLD REUSKEN,et al.  On the Approximate Cyclic Reduction Preconditioner , 1999, SIAM J. Sci. Comput..

[2]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[3]  R. Kriemann,et al.  Mathematik in den Naturwissenschaften Leipzig Parallel H-Matrix Arithmetics on Shared Memory Systems , 2004 .

[4]  Jianlin Xia,et al.  A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure , 2016, ACM Trans. Math. Softw..

[5]  Luc Giraud,et al.  Parallel Distributed Fast 3d Poisson Solver , 1997 .

[6]  William L. Briggs,et al.  A multigrid tutorial , 1987 .

[7]  Clément Weisbecker,et al.  Improving multifrontal solvers by means of algebraic Block Low-Rank representations. (Amélioration des solveurs multifrontaux à l'aide de représentations algébriques rang-faible par blocs) , 2013 .

[8]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[9]  Ronald Kriemann,et al.  $${{\fancyscript{H}}} $$H-LU factorization on many-core systems , 2013, Comput. Vis. Sci..

[10]  Francisco Argüello,et al.  Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms , 2011 .

[11]  Andreas Dedner,et al.  A generic grid interface for parallel and adaptive scientific computing. Part I: abstract framework , 2008, Computing.

[12]  Jianlin Xia,et al.  Superfast Multifrontal Method for Large Structured Linear Systems of Equations , 2009, SIAM J. Matrix Anal. Appl..

[13]  Eric Darve,et al.  Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation , 2015, SIAM J. Sci. Comput..

[14]  Gene H. Golub,et al.  On direct methods for solving Poisson's equation , 1970, Milestones in Matrix Computation.

[15]  Samuel Williams,et al.  An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling , 2015, SIAM J. Sci. Comput..

[16]  Y. Mukaigawa,et al.  Large Deviations Estimates for Some Non-local Equations I. Fast Decaying Kernels and Explicit Bounds , 2022 .

[17]  Chavez Chavez,et al.  Robust and scalable hierarchical matrix-based fast direct solver and preconditioner for the numerical solution of elliptic partial differential equations , 2017 .

[18]  Margherita Pagani,et al.  Second Edition , 2004 .

[19]  Robert Strzodka,et al.  Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed-Precision Multigrid , 2011, IEEE Transactions on Parallel and Distributed Systems.

[20]  Howard Jay Siegel,et al.  Predicting Performance and Selecting Modes of Parallelism: A Case Study Using Cyclic Reduction on Three Parallel Machines , 1993, J. Parallel Distributed Comput..

[21]  Mario Bebendorf,et al.  Hierarchical Matrices: A Means to Efficiently Solve Elliptic Boundary Value Problems , 2008 .

[22]  Ronald Kriemann,et al.  Parallel black box $$\mathcal {H}$$-LU preconditioning for elliptic boundary value problems , 2008 .

[23]  Wen-Yang Lin,et al.  A Parallel Algorithm for Solving Tridiagonal Linear Systems on Distributed-Memory Multiprocessors , 1994, Int. J. High Speed Comput..

[24]  Yao Zhang,et al.  Fast tridiagonal solvers on the GPU , 2010, PPoPP '10.

[25]  S. Kadioglu,et al.  A Comparative Study of the Harmonic and Arithmetic Averaging of Diffusion Coefficients for Non-linear Heat Conduction Problems , 2008 .

[26]  Chuck Pheatt,et al.  Intel® threading building blocks , 2008 .

[27]  J. Meijerink,et al.  An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .

[28]  Eric Darve,et al.  Fast hierarchical solvers for sparse matrices using low-rank approximation , 2015 .

[29]  Jianlin Xia,et al.  Randomized Sparse Direct Solvers , 2013, SIAM J. Matrix Anal. Appl..

[30]  David E. Keyes,et al.  A Direct Elliptic Solver Based on Hierarchically Low-rank Schur Complements , 2016, ArXiv.

[31]  Luc Giraud,et al.  Parallel Distributed FFT-Based Solvers for 3-D Poisson Problems in Meso-Scale Atmospheric Simulations , 2001, Int. J. High Perform. Comput. Appl..

[32]  W. Hackbusch,et al.  Hierarchical Matrices: Algorithms and Analysis , 2015 .

[33]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[34]  Martin J. Gander,et al.  Why it is Difficult to Solve Helmholtz Problems with Classical Iterative Methods , 2012 .

[35]  Sergej Rjasanow,et al.  Hierarchical Cholesky decomposition of sparse matrices arising from curl-curl-equation , 2007 .

[36]  L. Grasedyck,et al.  Domain-decomposition Based ℌ-LU Preconditioners , 2007 .

[37]  Sabine Le Borne,et al.  Numerische Mathematik Domain decomposition based H-LU preconditioning , 2009 .

[38]  Marcin Paprzycki,et al.  A Cyclic Reduction Approach to the Numerical Solution of Boundary Value ODEs , 1997, SIAM J. Sci. Comput..

[39]  W. Hackbusch A Sparse Matrix Arithmetic Based on $\Cal H$-Matrices. Part I: Introduction to ${\Cal H}$-Matrices , 1999, Computing.

[40]  David E. Keyes,et al.  Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems , 2017, Parallel Comput..

[41]  Jianlin Xia,et al.  Robust Approximate Cholesky Factorization of Rank-Structured Symmetric Positive Definite Matrices , 2010, SIAM J. Matrix Anal. Appl..

[42]  Jean-Yves L'Excellent,et al.  Improving Multifrontal Methods by Means of Block Low-Rank Representations , 2015, SIAM J. Sci. Comput..

[43]  R. Sweet A Parallel and Vector Variant of the Cyclic Reduction Algorithm , 1988 .

[44]  Lexing Ying,et al.  A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations , 2012, SIAM J. Sci. Comput..

[45]  I. Gladwell,et al.  An algorithm for the solution of Bordered ABD linear systems arising from BVPs , 2005 .

[46]  Ronald Kriemann,et al.  ℋ-LU Factorization on Many-Core Systems , 2014 .

[47]  Ronald Kriemann,et al.  H-LU Factorization on Many-Core Systems , 2014 .

[48]  G. Golub,et al.  A bibliography on semiseparable matrices* , 2005 .

[49]  B. Engquist,et al.  Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation , 2010, 1007.4290.

[50]  Murli M. Gupta,et al.  High accuracy multigrid solution of the 3D convection-diffusion equation , 2000, Appl. Math. Comput..

[51]  Ole Klein,et al.  Uncertainty Quantification for Porous Media Flow Using Multilevel Monte Carlo , 2015, LSSC.

[52]  G. Rodrigue,et al.  Preconditioning by incomplete block cyclic reduction , 1984 .

[53]  Ronald Kriemann,et al.  Parallel -Matrix Arithmetics on Shared Memory Systems , 2005, Computing.

[54]  Lisandro Dalcin,et al.  PetIGA: High-Performance Isogeometric Analysis , 2013, ArXiv.

[55]  Eric Darve,et al.  A fast, memory efficient and robust sparse preconditioner based on a multifrontal approach with applications to finite‐element matrices , 2016 .

[56]  Shivkumar Chandrasekaran,et al.  On the Numerical Rank of the Off-Diagonal Blocks of Schur Complements of Discretized Elliptic PDEs , 2010, SIAM J. Matrix Anal. Appl..

[57]  Jianlin Xia,et al.  Fast algorithms for hierarchically semiseparable matrices , 2010, Numer. Linear Algebra Appl..

[58]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[59]  Roger W. Hockney,et al.  A Fast Direct Solution of Poisson's Equation Using Fourier Analysis , 1965, JACM.

[60]  Shivkumar Chandrasekaran,et al.  A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations , 2006, SIAM J. Matrix Anal. Appl..

[61]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[62]  Eric Darve,et al.  An $$\mathcal O (N \log N)$$O(NlogN)  Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices , 2013 .

[63]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.