Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems

We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast direct solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has $O(k~N \log N~(\log N + k^2))$ arithmetic complexity and $O(k~N \log N)$ memory footprint, where $N$ is the number of degrees of freedom and $k$ is the rank of a typical off-diagonal block, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method where hierarchical semi-separable matrices are used for compressing the fronts, and with algebraic multigrid. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and memory footprint. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides.

[1]  Murli M. Gupta,et al.  High accuracy multigrid solution of the 3D convection-diffusion equation , 2000, Appl. Math. Comput..

[2]  G. Golub,et al.  A bibliography on semiseparable matrices* , 2005 .

[3]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[4]  Plamen Y. Yalamov,et al.  Stability of the block cyclic reduction , 1996 .

[5]  B. Engquist,et al.  Sweeping preconditioner for the Helmholtz equation: Hierarchical matrix representation , 2010, 1007.4290.

[6]  Clément Weisbecker,et al.  Improving multifrontal solvers by means of algebraic Block Low-Rank representations. (Amélioration des solveurs multifrontaux à l'aide de représentations algébriques rang-faible par blocs) , 2013 .

[7]  Eric Darve,et al.  An $$\mathcal O (N \log N)$$O(NlogN)  Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices , 2013 .

[8]  Patrick Amestoy,et al.  Hybrid scheduling for the parallel solution of linear systems , 2006, Parallel Comput..

[9]  Gene H. Golub,et al.  Cyclic Reduction - History and Applications , 1997 .

[10]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[11]  Per-Gunnar Martinsson,et al.  A direct solver for elliptic PDEs in three dimensions based on hierarchical merging of Poincaré-Steklov operators , 2016, J. Comput. Appl. Math..

[12]  Alexander Kalinkin,et al.  Schur Complement Computations in Intel® Math Kernel Library PARDISO , 2015 .

[13]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[14]  Roger W. Hockney,et al.  A Fast Direct Solution of Poisson's Equation Using Fourier Analysis , 1965, JACM.

[15]  Ronald Kriemann,et al.  Parallel black box $$\mathcal {H}$$-LU preconditioning for elliptic boundary value problems , 2008 .

[16]  Per-Gunnar Martinsson,et al.  A direct solver for variable coefficient elliptic PDEs discretized via a composite spectral collocation method , 2013, J. Comput. Phys..

[17]  Gene H. Golub,et al.  On direct methods for solving Poisson's equation , 1970, Milestones in Matrix Computation.

[18]  Margherita Pagani,et al.  Second Edition , 2004 .

[19]  Luc Giraud,et al.  Parallel Distributed Fast 3d Poisson Solver , 1997 .

[20]  Shivkumar Chandrasekaran,et al.  On the Numerical Rank of the Off-Diagonal Blocks of Schur Complements of Discretized Elliptic PDEs , 2010, SIAM J. Matrix Anal. Appl..

[21]  Wolfgang Hackbusch,et al.  A Sparse Matrix Arithmetic Based on H-Matrices. Part I: Introduction to H-Matrices , 1999, Computing.

[22]  Ronald Kriemann,et al.  Parallel -Matrix Arithmetics on Shared Memory Systems , 2005, Computing.

[23]  Luc Giraud,et al.  Parallel Distributed FFT-Based Solvers for 3-D Poisson Problems in Meso-Scale Atmospheric Simulations , 2001, Int. J. High Perform. Comput. Appl..

[24]  W. Hackbusch,et al.  Hierarchical Matrices: Algorithms and Analysis , 2015 .

[25]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[26]  Martin J. Gander,et al.  Why it is Difficult to Solve Helmholtz Problems with Classical Iterative Methods , 2012 .

[27]  Per-Gunnar Martinsson,et al.  A Direct Solver with O(N) Complexity for Variable Coefficient Elliptic PDEs Discretized via a High-Order Composite Spectral Collocation Method , 2013, SIAM J. Sci. Comput..

[28]  R. Kriemann,et al.  Mathematik in den Naturwissenschaften Leipzig Parallel H-Matrix Arithmetics on Shared Memory Systems , 2004 .

[29]  Marcel Bauer,et al.  Numerical Methods for Partial Differential Equations , 1994 .

[30]  Jianlin Xia,et al.  A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure , 2016, ACM Trans. Math. Softw..

[31]  Per-Gunnar Martinsson,et al.  A Fast Randomized Algorithm for Computing a Hierarchically Semiseparable Representation of a Matrix , 2011, SIAM J. Matrix Anal. Appl..

[32]  P. Swarztrauber THE METHODS OF CYCLIC REDUCTION, FOURIER ANALYSIS AND THE FACR ALGORITHM FOR THE DISCRETE SOLUTION OF POISSON'S EQUATION ON A RECTANGLE* , 1977 .

[33]  W. Hackbusch A Sparse Matrix Arithmetic Based on $\Cal H$-Matrices. Part I: Introduction to ${\Cal H}$-Matrices , 1999, Computing.

[34]  Jianlin Xia,et al.  A Superfast Structured Solver for Toeplitz Linear Systems via Randomized Sampling , 2012, SIAM J. Matrix Anal. Appl..

[35]  Sergej Rjasanow,et al.  Hierarchical Cholesky decomposition of sparse matrices arising from curl-curl-equation , 2007 .

[36]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[37]  Eric Darve,et al.  An O(NlogN)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal O (N \log N)$$\end{document} Fast Direct Solver fo , 2013, Journal of Scientific Computing.

[38]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[39]  Ronald Kriemann,et al.  ℋ-LU Factorization on Many-Core Systems , 2014 .

[40]  Ronald Kriemann,et al.  H-LU Factorization on Many-Core Systems , 2014 .

[41]  Lexing Ying,et al.  Hierarchical Interpolative Factorization for Elliptic Operators: Integral Equations , 2013, 1307.2666.

[42]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[43]  David E. Keyes,et al.  Parallel accelerated cyclic reduction preconditioner for three-dimensional elliptic PDEs with variable coefficients , 2017, J. Comput. Appl. Math..

[44]  Jean-Yves L'Excellent,et al.  Improving Multifrontal Methods by Means of Block Low-Rank Representations , 2015, SIAM J. Sci. Comput..

[45]  Eric Darve,et al.  Fast Hierarchical Solvers For Sparse Matrices Using Extended Sparsification and Low-Rank Approximation , 2015, SIAM J. Sci. Comput..

[46]  Samuel Williams,et al.  An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling , 2015, SIAM J. Sci. Comput..

[47]  L. GRASEDYCK,et al.  Performance Of H-Lu Preconditioning For Sparse Matrices , 2008 .