Parallel Symbolic Factorization for Sparse LU with Static Pivoting

This paper presents the design and implementation of a memory scalable parallel symbolic factorization algorithm for general sparse unsymmetric matrices. Our parallel algorithm uses a graph partitioning approach, applied to the graph of $|A|+|A|^T$, to partition the matrix in such a way that is good for sparsity preservation as well as for parallel factorization. The partitioning yields a so-called separator tree which represents the dependencies among the computations. We use the separator tree to distribute the input matrix over the processors using a block cyclic approach and a subtree to subprocessor mapping. The parallel algorithm performs a bottom-up traversal of the separator tree. With a combination of right-looking and left-looking partial factorizations, the algorithm obtains one column structure of $L$ and one row structure of $U$ at each step. The algorithm is implemented in C and MPI. From a performance study on large matrices, we show that the parallel algorithm significantly reduces the memory requirement of the symbolic factorization step, as well as the overall memory requirement of the parallel solver. It also often reduces the runtime of the sequential algorithm, which is already relatively small. In general, the parallel algorithm prevents the symbolic factorization step from being a time or memory bottleneck of the parallel solver.

[1]  Laura Grigori,et al.  A parallel algorithm for sparse symbolic LU factorization without pivoting on out—of—core matrices , 2001, ICS '01.

[2]  M. SIAMJ.,et al.  IMPROVED SYMBOLIC AND NUMERICAL FACTORIZATION ALGORITHMS FOR UNSYMMETRIC SPARSE MATRICES , 2002 .

[3]  George Karypis,et al.  Parmetis parallel graph partitioning and sparse matrix ordering library , 1997 .

[4]  James Demmel,et al.  A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..

[5]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[6]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[7]  Alan George,et al.  Communication results for parallel sparse Cholesky factorization on a hypercube , 1989, Parallel Comput..

[8]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[9]  John R. Gilbert,et al.  A parallel algorithm for sparse symbolic Cholesky factorization on a multiprocessor , 1988, Parallel Comput..

[10]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..

[11]  J. V. Grondelle,et al.  Symbolic Sparse Cholesky Factorisation Using Elimination Trees , 1999 .

[12]  Sivan Toledo,et al.  Toward an Efficient Column Minimum Degree Code for Symmetric Multiprocessors , 1999, PPSC.

[13]  Joseph W. H. Liu,et al.  Exploiting Structural Symmetry in Unsymmetric Sparse Symbolic Factorization , 1992, SIAM J. Matrix Anal. Appl..

[14]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[15]  Iain S. Duff,et al.  Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .

[16]  Vipin Kumar,et al.  PSPASES: An Efficient and Scalable Parallel Sparse Direct Solver , 1999, PPSC.

[17]  Joseph W. H. Liu,et al.  Elimination Structures for Unsymmetric Sparse $LU$ Factors , 1993, SIAM J. Matrix Anal. Appl..

[18]  D. Rose,et al.  Algorithmic aspects of vertex elimination on directed graphs. , 1975 .

[19]  Bruce Hendrickson,et al.  An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations , 1995, SIAM J. Sci. Comput..

[20]  Iain S. Duff,et al.  On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix , 2000, SIAM J. Matrix Anal. Appl..