Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU

A novel algorithm for solving in parallel a sparse triangular linear system on a graphical processing unit is proposed. It implements the solution of the triangular system in two phases. First, the analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the solve phase obtains the full solution by iterating sequentially across the constructed levels. The solution elements corresponding to each single level are obtained at once in parallel. The numerical experiments are also presented and it is shown that the incomplete-LU and Cholesky preconditioned iterative methods, using the parallel sparse triangular solve algorithm, can achieve on average more than 2× speedup on graphical processing units (GPUs) over their CPU implementation.

[1]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[2]  Michael T. Heath,et al.  Modified cyclic algorithms for solving triangular systems on distributed-memory multiprocessors , 1988 .

[3]  Fernando L. Alvarado,et al.  Optimal Parallel Solution of Sparse Triangular Systems , 1993, SIAM J. Sci. Comput..

[4]  Jan Mayer,et al.  Parallel algorithms for solving linear systems with sparse triangular matrices , 2009, Computing.

[5]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[6]  Fernando L. Alvarado,et al.  A Fast Reordering Algorithm for Parallel Sparse Triangular Solution , 1992, SIAM J. Sci. Comput..

[7]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[8]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[9]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[10]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[11]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[12]  Nicholas J. Higham,et al.  Stability of the Partitioned Inverse Method for Parallel Solution of Sparse Triangular Systems , 1994, SIAM J. Sci. Comput..

[13]  Thomas F. Coleman,et al.  A New Method for Solving Triangular Systems on Distributed Memory Message-Passing Multiprocessors , 1989 .

[14]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[15]  G. P. Bhattacharjee,et al.  A parallel search algorithm for directed acyclic graphs , 1984, BIT.

[16]  Erik G. Boman,et al.  Factors Impacting Performance of Multithreaded Sparse Triangular Solve , 2010, VECPAR.

[17]  Michael T. Heath,et al.  Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .

[18]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[19]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[20]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[21]  Joel H. Saltz,et al.  Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..

[22]  Anoop Gupta,et al.  Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck , 1990, Parallel Comput..

[23]  R. Brent,et al.  Solving Triangular Systems on a Parallel Computer , 1977 .