An efficient sparse approximate inverse preconditioning algorithm on GPU

The sparse approximate inverse (SPAI) preconditioner has proven to be effective in accelerating the convergence of iterative methods. Recently, accelerating it on the graphics processing unit (GPU) has attracted considerable attention due to the fact that the cost of constructing it is high. This motivates us to investigate how to accelerate the construction of SPAI preconditioners on GPU in this paper. We propose an efficient sparse approximate inverse algorithm on GPU, called SPAI‐Adaptive. For our proposed SPAI‐Adaptive, there are the following novelties: (1) an adaptive thread allocation strategy for SPAI‐Adaptive is proposed to assign the optimal thread number for each column of the preconditioner, and (2) Each component of the preconditioner, which includes finding indices I and J, constructing local submatrix, decomposing the local matrix into QR, and solving the upper triangular linear system, is computed in parallel inside a thread group of GPU. Experimental results show that the proposed SPAI‐Adaptive is effective, and has good performance and high parallelism.

[1]  Massimo Bernaschi,et al.  Factorized Sparse Approximate Inverses on GPUs , 2014 .

[2]  Karl Rupp,et al.  Sparse approximate inverse preconditioners for iterative solvers on GPUs , 2012, HiPC 2012.

[3]  Michele Benzi,et al.  A Sparse Approximate Inverse Preconditioner for Nonsymmetric Linear Systems , 1998, SIAM J. Sci. Comput..

[4]  Yushun Wang,et al.  GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations , 2017, Int. J. Comput. Math..

[5]  Jun Wang,et al.  A novel multi–graphics processing unit parallel optimization framework for the sparse matrix‐vector multiplication , 2017, Concurr. Comput. Pract. Exp..

[6]  Jun Wang,et al.  Research on the conjugate gradient algorithm with a modified incomplete Cholesky preconditioner on GPU , 2014, J. Parallel Distributed Comput..

[7]  M. Benzi,et al.  A comparative study of sparse approximate inverse preconditioners , 1999 .

[8]  Edmond Chow,et al.  Approximate Inverse Preconditioners via Sparse-Sparse Iterations , 1998, SIAM J. Sci. Comput..

[9]  Michele Benzi,et al.  Robust Approximate Inverse Preconditioning for the Conjugate Gradient Method , 2000, SIAM J. Sci. Comput..

[10]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Michele Benzi,et al.  A Sparse Approximate Inverse Preconditioner for the Conjugate Gradient Method , 1996, SIAM J. Sci. Comput..

[12]  Jack J. Dongarra,et al.  Incomplete Sparse Approximate Inverses for Parallel Preconditioning , 2018, Parallel Comput..

[13]  Yu Wang,et al.  GPU-Accelerated Sparse LU Factorization for Circuit Simulation with Performance Modeling , 2015, IEEE Transactions on Parallel and Distributed Systems.

[14]  Yousef Saad,et al.  GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.

[15]  Marcus J. Grote,et al.  Parallel Preconditioning with Sparse Approximate Inverses , 1997, SIAM J. Sci. Comput..

[16]  Carlo Janna,et al.  FSAIPACK: A Software Package for High-Performance Factored Sparse Approximate Inverse Preconditioning , 2015, ACM Trans. Math. Softw..

[17]  Jonas Koko,et al.  Parallel preconditioned conjugate gradient algorithm on GPU , 2012, J. Comput. Appl. Math..

[18]  Massimo Bernaschi,et al.  A Factored Sparse Approximate Inverse Preconditioned Conjugate Gradient Solver on Graphics Processing Units , 2016, SIAM J. Sci. Comput..

[19]  A. Griewank,et al.  Approximate inverse preconditionings for sparse linear systems , 1992 .

[20]  Edmond Chow,et al.  Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs , 2015, ISC.

[21]  Timothy A. Davis,et al.  Accelerating sparse cholesky factorization on GPUs , 2014, IA3 '14.

[22]  Edmond Chow,et al.  Updating incomplete factorization preconditioners for model order reduction , 2016, Numerical Algorithms.

[23]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[24]  Jack J. Dongarra,et al.  Preconditioned Krylov solvers on GPUs , 2017, Parallel Comput..

[25]  Carlo Janna,et al.  A generalized Block FSAI preconditioner for nonsymmetric linear systems , 2014, J. Comput. Appl. Math..

[26]  Kenli Li,et al.  A Hybrid Parallel Solving Algorithm on GPU for Quasi-Tridiagonal System of Linear Equations , 2016, IEEE Transactions on Parallel and Distributed Systems.

[27]  Hasan Dag,et al.  Iterative methods and parallel computation for power systems , 1996 .

[28]  Zhongxiao Jia,et al.  A power sparse approximate inverse preconditioning procedure for large sparse linear systems , 2009, Numer. Linear Algebra Appl..

[29]  Jun Wang,et al.  Efficient dense matrix‐vector multiplication on GPU , 2018, Concurr. Comput. Pract. Exp..

[30]  Thomas Brandes,et al.  CPU vs. GPU - Performance comparison for the Gram-Schmidt algorithm , 2012 .

[31]  Arno C. N. van Duin,et al.  Scalable Parallel Preconditioning with the Sparse Approximate Inverse of Triangular Matrices , 1999, SIAM J. Matrix Anal. Appl..

[32]  A. Gorobets,et al.  MPI-CUDA sparse matrix–vector multiplication for the conjugate gradient method with an approximate inverse preconditioner , 2014 .

[33]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[34]  Dimitar Lukarski,et al.  Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners , 2012 .

[35]  Kai Wang,et al.  A GPU-Based Parallel Genetic Algorithm for Generating Daily Activity Plans , 2012, IEEE Transactions on Intelligent Transportation Systems.

[36]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[37]  Edmond Chow,et al.  Batched Generation of Incomplete Sparse Approximate Inverses on GPUs , 2016, 2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA).

[38]  Karl Rupp,et al.  ViennaCL-A High Level Linear Algebra Library for GPUs and Multi-Core CPUs , 2010 .

[39]  Maryam Mehri Dehnavi,et al.  Parallel Sparse Approximate Inverse Preconditioning on Graphic Processing Units , 2013, IEEE Transactions on Parallel and Distributed Systems.

[40]  Jiaquan Gao,et al.  Adaptive Optimization $$l_1$$l1-Minimization Solvers on GPU , 2017, International Journal of Parallel Programming.

[41]  C. Lanczos,et al.  Iterative Solution of Large-Scale Linear Systems , 1958 .

[42]  Jiaquan Gao,et al.  A multi-GPU parallel optimization model for the preconditioned conjugate gradient algorithm , 2017, Parallel Comput..

[43]  Edmond Chow,et al.  A Priori Sparsity Patterns for Parallel Sparse Approximate Inverse Preconditioners , 1999, SIAM J. Sci. Comput..

[44]  L. Kolotilina,et al.  Factorized Sparse Approximate Inverse Preconditionings I. Theory , 1993, SIAM J. Matrix Anal. Appl..

[45]  Daniele Bertaccini,et al.  Sparse approximate inverse preconditioners on high performance GPU platforms , 2016, Comput. Math. Appl..

[46]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[47]  Henk A. van der Vorst,et al.  A Vectorizable Variant of some ICCG Methods , 1982 .

[48]  Andrea Franceschini,et al.  Multilevel approaches for FSAI preconditioning , 2018, Numerical Linear Algebra with Applications.

[49]  Karl Rupp,et al.  ViennaCL - Linear Algebra Library for Multi- and Many-Core Architectures , 2016, SIAM J. Sci. Comput..

[50]  Arno C. N. van,et al.  Scalable Parallel Preconditioning with the Sparse Approximate Inverse of Triangular Matrices , 1999 .

[51]  Sheldon X.-D. Tan,et al.  GPU-Accelerated Parallel Sparse LU Factorization Method for Fast Circuit Analysis , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.