Incomplete Sparse Approximate Inverses for Parallel Preconditioning

Abstract In this paper, we propose a new preconditioning method that can be seen as a generalization of block-Jacobi methods, or as a simplification of the sparse approximate inverse (SAI) preconditioners. The “Incomplete Sparse Approximate Inverses” (ISAI) is in particular efficient in the solution of sparse triangular linear systems of equations. Those arise, for example, in the context of incomplete factorization preconditioning. ISAI preconditioners can be generated via an algorithm providing fine-grained parallelism, which makes them attractive for hardware with a high concurrency level. In a study covering a large number of matrices, we identify the ISAI preconditioner as an attractive alternative to exact triangular solves in the context of incomplete factorization preconditioning.

[1]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[2]  Daniel B. Szyld,et al.  Asynchronous Iterations , 2011, Encyclopedia of Parallel Computing.

[3]  Santa Clara,et al.  Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU , 2011 .

[4]  E. Chow,et al.  On the use of iterative methods and blocking for solving sparse triangular systems in incomplete factorization preconditioning , 2016 .

[5]  Jack J. Dongarra,et al.  Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs , 2016, ICCS.

[6]  Dimitar Lukarski,et al.  Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners , 2012 .

[7]  Jack J. Dongarra,et al.  Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[8]  Edmond Chow,et al.  Fine-Grained Parallel Incomplete LU Factorization , 2015, SIAM J. Sci. Comput..

[9]  Jack J. Dongarra,et al.  Efficiency of General Krylov Methods on GPUs -- An Experimental Study , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Edmond Chow,et al.  Updating incomplete factorization preconditioners for model order reduction , 2016, Numerical Algorithms.

[11]  Edmond Chow,et al.  Domain Overlap for Iterative Sparse Triangular Solves on GPUs , 2016, Software for Exascale Computing.

[12]  Enrique S. Quintana-Ortí,et al.  Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs , 2017, PMAM@PPoPP.

[13]  E. L. Poole,et al.  Multicolor ICCG methods for vector computers , 1987 .

[14]  Arno C. N. van Duin,et al.  Scalable Parallel Preconditioning with the Sparse Approximate Inverse of Triangular Matrices , 1999, SIAM J. Matrix Anal. Appl..

[15]  Jack J. Dongarra,et al.  Improving the Performance of CA-GMRES on Multicores with Multiple GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[16]  C. Lanczos,et al.  Iterative Solution of Large-Scale Linear Systems , 1958 .

[17]  Yousef Saad,et al.  Multilevel ILU With Reorderings for Diagonal Dominance , 2005, SIAM J. Sci. Comput..

[18]  Edmond Chow,et al.  Approximate Inverse Techniques for Block-Partitioned Matrices , 1997, SIAM J. Sci. Comput..

[19]  Alex Pothen,et al.  A Scalable Parallel Algorithm for Incomplete Factor Preconditioning , 2000, SIAM J. Sci. Comput..

[20]  Edmond Chow,et al.  Iterative Sparse Triangular Solves for Preconditioning , 2015, Euro-Par.

[21]  T. Huckle,et al.  Frobenius norm minimization and probing for preconditioning , 2007 .

[22]  Marcus J. Grote,et al.  Parallel Preconditioning with Sparse Approximate Inverses , 1997, SIAM J. Sci. Comput..

[23]  Christian Wagner,et al.  Multilevel ILU decomposition , 1999, Numerische Mathematik.

[24]  Joel H. Saltz,et al.  Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..

[25]  Edmond Chow,et al.  Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns , 2001, Int. J. High Perform. Comput. Appl..

[26]  W. Joubert,et al.  Numerical experiments with parallel orderings for ILU preconditioners. , 1999 .

[27]  Laura Grigori,et al.  Communication Avoiding ILU0 Preconditioner , 2015, SIAM J. Sci. Comput..

[28]  Erik G. Boman,et al.  Factors Impacting Performance of Multithreaded Sparse Triangular Solve , 2010, VECPAR.

[29]  Barbara Kaltenbacher,et al.  Iterative Solution Methods , 2015, Handbook of Mathematical Methods in Imaging.

[30]  Pradeep Dubey,et al.  Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver , 2014, ISC.

[31]  Thomas Huckle,et al.  Smoothing and Regularization with Modified Sparse Approximate Inverses , 2010, J. Electr. Comput. Eng..

[32]  L. Kolotilina,et al.  Factorized Sparse Approximate Inverse Preconditionings I. Theory , 1993, SIAM J. Matrix Anal. Appl..

[33]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[34]  Robert Schreiber,et al.  Efficient ICCG on a Shared Memory Multiprocessor , 1992, Int. J. High Speed Comput..

[35]  S. Doi On parallelism and convergence of incomplete LU factorizations , 1991 .

[36]  Timothy A. Davis,et al.  Algorithm 907 , 2010 .

[37]  I. Duff,et al.  The effect of ordering on preconditioned conjugate gradients , 1989 .

[38]  Edmond Chow,et al.  Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs , 2015, ISC.

[39]  Michele Benzi,et al.  Orderings for Incomplete Factorization Preconditioning of Nonsymmetric Problems , 1999, SIAM J. Sci. Comput..

[40]  Jan Mayer,et al.  Parallel algorithms for solving linear systems with sparse triangular matrices , 2009, Computing.

[41]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[42]  M. Benzi,et al.  A comparative study of sparse approximate inverse preconditioners , 1999 .

[43]  Abdelhalim Amer 8th International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM'17) , 2018, Parallel Comput..

[44]  Arno C. N. van,et al.  Scalable Parallel Preconditioning with the Sparse Approximate Inverse of Triangular Matrices , 1999 .

[45]  J. Meijerink,et al.  An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .