Enlarged Krylov Subspace Methods and Preconditioners for Avoiding Communication

The performance of an algorithm on any architecture is dependent on the processing unit’s speed for performing floating point operations (flops) and the speed of accessing memory and disk. As the cost of communication is much higher than arithmetic operations, and since this gap is expected to continue to increase exponentially, communication is often the bottleneck in numerical algorithms. In a quest to address the communication problem, recent research has focused on communication avoiding Krylov subspace methods based on the so called s-step methods. However there are very few communication avoiding preconditioners, and this represents a serious limitation of these methods. In this thesis, we present a communication avoiding ILU0 preconditioner for solving large systems of linear equations (Ax=b) by using iterative Krylov subspace methods. Our preconditioner allows to perform s iterations of the iterative method with no communication, by applying a heuristic alternating min-max layers reordering to the input matrix A, and through ghosting some of the input data and performing redundant computation. We also introduce a new approach for reducing communication in the Krylov subspace methods, that consists of enlarging the Krylov subspace by a maximum of t vectors per iteration, based on the domain decomposition of the graph of A. The enlarged Krylov projection subspace methods lead to faster convergence in terms of iterations and to parallelizable algorithms with less communication, with respect to Krylov methods. We discuss two new versions of Conjugate Gradient, multiple search direction with orthogonalization CG (MSDO-CG) and long recurrence enlarged CG (LRE-CG).

[1]  Berkant Barla Cambazoglu,et al.  Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices , 2008, J. Parallel Distributed Comput..

[2]  C. Kelley Iterative Methods for Linear and Nonlinear Equations , 1987 .

[3]  Takumi Washio,et al.  Ordering strategies and related techniques to overcome the trade-off between parallelism and convergence in incomplete factorizations , 1999, Parallel Comput..

[4]  Edmond Chow,et al.  Fine-Grained Parallel Incomplete LU Factorization , 2015, SIAM J. Sci. Comput..

[5]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[6]  Julien Langou,et al.  A note on the error analysis of classical Gram–Schmidt , 2006, Numerische Mathematik.

[7]  Cornelis Vuik,et al.  Comparison of Two-Level Preconditioners Derived from Deflation, Domain Decomposition and Multigrid Methods , 2009, J. Sci. Comput..

[8]  Frédéric Hecht,et al.  New development in freefem++ , 2012, J. Num. Math..

[9]  Laura Grigori,et al.  Communication Avoiding ILU0 Preconditioner , 2015, SIAM J. Sci. Comput..

[10]  James Demmel,et al.  CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..

[11]  Jocelyne Erhel,et al.  A parallel GMRES version for general sparse matrices. , 1995 .

[12]  J. Gilbert,et al.  Sparse Partial Pivoting in Time Proportional to Arithmetic Operations , 1986 .

[13]  Reinhard Nabben,et al.  Deflation and Balancing Preconditioners for Krylov Subspace Methods Applied to Nonsymmetric Matrices , 2008, SIAM J. Matrix Anal. Appl..

[14]  Laura Grigori,et al.  Robust algebraic Schur complement preconditioners based on low rank corrections , 2014 .

[15]  Frédéric Guyomarc'h,et al.  An Augmented Conjugate Gradient Method for Solving Consecutive Symmetric Positive Definite Linear Systems , 2000, SIAM J. Matrix Anal. Appl..

[16]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[17]  Cornelis Vuik,et al.  On the Construction of Deflation-Based Preconditioners , 2001, SIAM J. Sci. Comput..

[18]  Saad,et al.  A Multi-Level Preconditioner with Applicationsto the Numerical Simulation of Coating ProblemsYousef , 1998 .

[19]  Julien Langou,et al.  Stability Analysis of QR factorization in an Oblique Inner Product , 2014, 1401.5171.

[20]  C. Lanczos Solution of Systems of Linear Equations by Minimized Iterations1 , 1952 .

[21]  H. V. D. Vorst,et al.  The superlinear convergence behaviour of GMRES , 1993 .

[22]  James Demmel,et al.  Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[23]  John Van Rosendale Minimizing Inner Product Data Dependencies in Conjugate Gradient Iteration , 1983, ICPP.

[24]  Frédéric Guyomarc'h,et al.  A Deflated Version of the Conjugate Gradient Algorithm , 1999, SIAM J. Sci. Comput..

[25]  D. O’Leary The block conjugate gradient algorithm and related methods , 1980 .

[26]  Edmond Chow,et al.  A Priori Sparsity Patterns for Parallel Sparse Approximate Inverse Preconditioners , 1999, SIAM J. Sci. Comput..

[27]  Edmond Chow,et al.  Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns , 2001, Int. J. High Perform. Comput. Appl..

[28]  K. Burrage,et al.  Restarted GMRES preconditioned by deflation , 1996 .

[29]  H. Walker Implementation of the GMRES method using householder transformations , 1988 .

[30]  R. Fletcher Conjugate gradient methods for indefinite systems , 1976 .

[31]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[32]  Anthony T. Chronopoulos,et al.  s-step iterative methods for symmetric linear systems , 1989 .

[33]  Howard C. Elman,et al.  Relaxed and stabilized incomplete factorizations for non-self-adjoint linear systems , 1989 .

[34]  Olaf Schenk,et al.  Fast Methods for Computing Selected Elements of the Green's Function in Massively Parallel Nanoelectronic Device Simulations , 2013, Euro-Par.

[35]  James Demmel,et al.  Minimizing Communication in Linear Algebra , 2009, ArXiv.

[36]  Xiaoye S. Li,et al.  An overview of SuperLU: Algorithms, implementation, and user interface , 2003, TOMS.

[37]  James Demmel,et al.  Avoiding Communication in Two-Sided Krylov Subspace Methods , 2011 .

[38]  Xiao-Chuan Cai,et al.  A Restricted Additive Schwarz Preconditioner for General Sparse Linear Systems , 1999, SIAM J. Sci. Comput..

[39]  M. Gutknecht BLOCK KRYLOV SPACE METHODS FOR LINEAR SYSTEMS WITH MULTIPLE RIGHT-HAND SIDES : AN , 2005 .

[40]  Jennifer A. Scott,et al.  On Positive Semidefinite Modification Schemes for Incomplete Cholesky Factorization , 2014, SIAM J. Sci. Comput..

[41]  François-Henry Rouet,et al.  Modeling 1D Distributed-Memory Dense Kernels for an Asynchronous Multifrontal Sparse Solver , 2014, VECPAR.

[42]  M. Benzi,et al.  A comparative study of sparse approximate inverse preconditioners , 1999 .

[43]  Alicja Smoktunowicz,et al.  Numerical stability of orthogonalization methods with a non-standard inner product , 2012 .

[44]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[45]  R. Nicolaides Deflation of conjugate gradients with applications to boundary value problems , 1987 .

[46]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[47]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[48]  Michele Benzi,et al.  Orderings for Incomplete Factorization Preconditioning of Nonsymmetric Problems , 1999, SIAM J. Sci. Comput..

[49]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[50]  Ronald B. Morgan,et al.  A Restarted GMRES Method Augmented with Eigenvectors , 1995, SIAM J. Matrix Anal. Appl..

[51]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[52]  M. Benzi Preconditioning techniques for large linear systems: a survey , 2002 .

[53]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[54]  Kesheng Wu,et al.  A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..

[55]  Tongxiang Gu,et al.  Multiple search direction conjugate gradient method I: methods and their propositions , 2004, Int. J. Comput. Math..

[56]  Pierre-Alexandre Bliman,et al.  A cooperative conjugate gradient method for linear systems permitting multithread implementation of low complexity , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[57]  James Demmel,et al.  Parallel numerical linear algebra , 1993, Acta Numerica.

[58]  O. Axelsson,et al.  Algebraic multilevel preconditioning methods, II , 1990 .

[59]  M. Rozložník,et al.  The loss of orthogonality in the Gram-Schmidt orthogonalization process , 2005 .

[60]  B. Vital Etude de quelques methodes de resolution de problemes lineaires de grande taille sur multiprocesseur , 1990 .

[61]  Marcus J. Grote,et al.  Algebraic Multilevel Preconditioner for the Helmholtz Equation in Heterogeneous Media , 2009, SIAM J. Sci. Comput..

[62]  Gary L. Miller,et al.  Nested Dissection: A survey and comparison of various nested dissection algorithms , 1992 .

[63]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[64]  Jörg Liesen,et al.  A Framework for Deflated and Augmented Krylov Subspace Methods , 2012, SIAM J. Matrix Anal. Appl..

[65]  L. Yu. Kolotilina,et al.  Twofold deflation preconditioning of linear algebraic systems. I. Theory , 1998 .

[66]  Wim Vanroose,et al.  Hiding Global Communication Latency in the GMRES Algorithm on Massively Parallel Machines , 2013, SIAM J. Sci. Comput..

[67]  Yousef Saad,et al.  Deflated and Augmented Krylov Subspace Techniques , 1997, Numer. Linear Algebra Appl..

[68]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[69]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[70]  S. A. Kharchenko,et al.  Eigenvalue translation based preconditioners for the GMRES(k) method , 1995, Numer. Linear Algebra Appl..

[71]  Olaf Schenk,et al.  Solving unsymmetric sparse systems of linear equations with PARDISO , 2004, Future Gener. Comput. Syst..