Structure-adaptive parallel solution of sparse triangular linear systems

Abstract Solving sparse triangular systems of linear equations is a performance bottleneck in many methods for solving more general sparse systems. Both for direct methods and for many iterative preconditioners, it is used to solve the system or improve an approximate solution, often across many iterations. Solving triangular systems is notoriously resistant to parallelism, however, and existing parallel linear algebra packages appear to be ineffective in exploiting significant parallelism for this problem. We develop a novel parallel algorithm based on various heuristics that adapt to the structure of the matrix and extract parallelism that is unexploited by conventional methods. By analyzing and reordering operations, our algorithm can often extract parallelism even for cases where most of the nonzero matrix entries are near the diagonal. Our main parallelism strategies are: (1) identify independent rows, (2) send data earlier to achieve greater overlap, and (3) process dense off-diagonal regions in parallel. We describe the implementation of our algorithm in Charm++ and MPI and present promising experimental results on up to 512 cores of BlueGene/P, using numerous sparse matrices from real applications.

[1]  C. D. Pham Comparison of message aggregation strategies for parallel simulations on a high performance cluster , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[2]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[3]  Padma Raghavan,et al.  Efficient Parallel Sparse Triangular Solution Using Selective Inversion , 1998, Parallel Process. Lett..

[4]  Robert D. Falgout,et al.  Multigrid Smoothers for Ultra-Parallel Computing , 2011 .

[5]  Enrique S. Quintana-Ortí,et al.  Exploiting thread-level parallelism in the iterative solution of sparse linear systems , 2011, Parallel Comput..

[6]  Joel H. Saltz,et al.  Aggregation Methods for Solving Sparse Triangular Systems on Multiprocessors , 1990, SIAM J. Sci. Comput..

[7]  Robert D. Falgout,et al.  The Design and Implementation of hypre, a Library of Parallel High Performance Preconditioners , 2006 .

[8]  Laxmikant V. Kalé,et al.  Structured Dagger: A Coordination Language for Message-Driven Programming , 1996, Euro-Par, Vol. I.

[9]  Robert D. Falgout,et al.  Multigrid Smoothers for Ultraparallel Computing , 2011, SIAM J. Sci. Comput..

[10]  Timothy A. Davis,et al.  Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization , 2011, TOMS.

[11]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..

[12]  Jan Mayer,et al.  Parallel algorithms for solving linear systems with sparse triangular matrices , 2009, Computing.

[13]  Nicholas J. Higham,et al.  Stability of the Partitioned Inverse Method for Parallel Solution of Sparse Triangular Systems , 1994, SIAM J. Sci. Comput..

[14]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[15]  Laxmikant V. Kale,et al.  Charm++ and AMPI: Adaptive Runtime Strategies via Migratable Objects , 2009 .

[16]  Erik G. Boman,et al.  Factors Impacting Performance of Multithreaded Sparse Triangular Solve , 2010, VECPAR.

[17]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[18]  Laxmikant V. Kale,et al.  Intelligent runtime tuning of parallel applications with control points , 2010 .

[19]  Aslak Tveito,et al.  Numerical solution of partial differential equations on parallel computers , 2006 .