A parallel multithreaded sparse triangular linear system solver

Abstract We propose a parallel sparse triangular linear system solver based on the Spike algorithm. Sparse triangular systems are required to be solved in many applications. Often, they are a bottleneck due to their inherently sequential nature. Furthermore, typically many successive systems with the same coefficient matrix and with different right hand side vectors are required to be solved. The proposed solver decouples the problem at the cost of extra arithmetic operations as in the banded case. Compared to the banded case, there are extra savings due to the sparsity of the triangular coefficient matrix. We show the parallel performance of the proposed solver against the state-of-the-art parallel sparse triangular solver in Intel’s Math Kernel Library (MKL) on a multicore architecture. We also show the effect of various sparse matrix reordering schemes. Numerical results show that the proposed solver outperforms MKL’s solver in ∼ 80 % of cases by a factor of 2.47, on average.

[1]  Jan Mayer,et al.  Parallel algorithms for solving linear systems with sparse triangular matrices , 2009, Computing.

[2]  Eric C. Kerrigan,et al.  Balancing Locality and Concurrency: Solving Sparse Triangular Systems on GPUs , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[3]  David J. Kuck,et al.  Practical Parallel Band Triangular System Solvers , 1978, TOMS.

[4]  Pradeep Dubey,et al.  Sparsifying Synchronization for High-Performance Shared-Memory Sparse Triangular Solver , 2014, ISC.

[5]  Stanley C. Eisenstat,et al.  Yale sparse matrix package I: The symmetric codes , 1982 .

[6]  Xiaoye S. Li Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms , 2008, VECPAR.

[7]  Patrick R. Amestoy,et al.  An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..

[8]  Jennifer A. Scott,et al.  Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning , 2018, J. Parallel Distributed Comput..

[9]  Ioannis E. Venetis,et al.  A direct tridiagonal solver based on Givens rotations for GPU architectures , 2015, Parallel Comput..

[10]  Timothy A. Davis,et al.  An Unsymmetric-pattern Multifrontal Method for Sparse Lu Factorization , 1993 .

[11]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[12]  Jonathan M. Cohen,et al.  Parallel Graph Coloring with Applications to the Incomplete-LU Factorization on the GPU , 2015 .

[13]  Eric Polizzi,et al.  A threaded SPIKE algorithm for solving general banded systems , 2011, Parallel Comput..

[14]  Murat Manguoglu,et al.  Parallel Solution of Sparse Linear Systems , 2012, High-Performance Scientific Computing.

[15]  Jack J. Dongarra,et al.  On some parallel banded system solvers , 1984, Parallel Comput..

[16]  Katherine Yelick,et al.  Automatic Performance Tuning and Analysis of Sparse Triangular Solve , 2002 .

[17]  Padma Raghavan,et al.  Adapting Sparse Triangular Solution to GPUs , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[18]  Murat Manguoglu,et al.  PSPIKE: A Parallel Hybrid Sparse Linear System Solver , 2009, Euro-Par.

[19]  Alex Pothen,et al.  A Scalable Parallel Algorithm for Incomplete Factor Preconditioning , 2000, SIAM J. Sci. Comput..

[20]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[21]  Murat Manguoglu A domain-decomposing parallel sparse linear system solver , 2011, J. Comput. Appl. Math..

[22]  Murat Manguoglu,et al.  A Multithreaded Recursive and Nonrecursive Parallel Sparse Direct Solver , 2016 .

[23]  Robert Schreiber,et al.  Efficient ICCG on a Shared Memory Multiprocessor , 1992, Int. J. High Speed Comput..

[24]  Anoop Gupta,et al.  Parallel ICCG on a hierarchical memory multiprocessor - Addressing the triangular solve bottleneck , 1990, Parallel Comput..

[25]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[26]  Yousef Saad,et al.  GPU-accelerated preconditioned iterative linear solvers , 2013, The Journal of Supercomputing.

[27]  Michele Colajanni,et al.  PSBLAS: a library for parallel linear algebra computation on sparse matrices , 2000, TOMS.

[28]  Laxmikant V. Kalé,et al.  Structure-adaptive parallel solution of sparse triangular linear systems , 2014, Parallel Comput..

[29]  John N. Shadid,et al.  Aztec user`s guide. Version 1 , 1995 .

[30]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[31]  Erik G. Boman,et al.  Factors Impacting Performance of Multithreaded Sparse Triangular Solve , 2010, VECPAR.

[32]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[33]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[34]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[35]  Ahmed Sameh,et al.  SPIKE: A parallel environment for solving banded linear systems , 2007 .

[36]  Vipin Kumar,et al.  PSPASES: An Efficient and Scalable Parallel Sparse Direct Solver , 1999, PPSC.

[37]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[38]  Hiroshi Nakashima,et al.  Algebraic Block Multi-Color Ordering Method for Parallel Multi-Threaded Sparse Triangular Solver in ICCG Method , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[39]  Geoffrey C. Fox,et al.  A parallel Gauss-Seidel algorithm for sparse power system matrices , 1994, Proceedings of Supercomputing '94.

[40]  M. Naumov Parallel Incomplete-LU and Cholesky Factorization in the Preconditioned Iterative Methods on the GPU , 2012 .

[41]  Padma Raghavan,et al.  A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[42]  Weifeng Liu,et al.  swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures , 2018, PPoPP.

[43]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[44]  Hong Zhang,et al.  Sparse triangular solves for ILU revisited: data layout crucial to better performance , 2011, Int. J. High Perform. Comput. Appl..

[45]  Ahmed H. Sameh,et al.  A parallel hybrid banded system solver: the SPIKE algorithm , 2006, Parallel Comput..

[46]  Wolfgang Fichtner,et al.  PARDISO: a high-performance serial and parallel sparse linear solver in semiconductor device simulation , 2001, Future Gener. Comput. Syst..

[47]  Brian Vinter,et al.  A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves , 2016, Euro-Par.

[48]  Robert D. Falgout,et al.  hypre: A Library of High Performance Preconditioners , 2002, International Conference on Computational Science.

[49]  R. Brent,et al.  Solving Triangular Systems on a Parallel Computer , 1977 .

[50]  Fernando L. Alvarado,et al.  A Fast Reordering Algorithm for Parallel Sparse Triangular Solution , 1992, SIAM J. Sci. Comput..

[51]  Murat Manguoglu,et al.  Parallel scalable PDE-constrained optimization: antenna identification in hyperthermia cancer treatment planning , 2009, Computer Science - Research and Development.

[52]  Joel H. Saltz,et al.  Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..