An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination

Although Gaussian elimination with partial pivoting is a robust algorithm to solve unsymmetric sparse linear systems of equations, it is difficult to implement efficiently on parallel machines, because of its dynamic and somewhat unpredicitable way of generating work and intermediate results at run time. In this paper, we present an efficient parallel algorithm that overcomes this difficulty. The high performance of our algorithm is achieved through (1) using a graph reduction technique and a supernode-panel computational kernel for high single processor utilization, and (2) scheduling two types of parallel tasks for a high level of concurrency. One such task is factoring the independent panels on the disjoint subtree in the column elimination tree of A. Another task is updating a panel by previously computed supernodes. A scheduler assigns tasks to free processors dynamically and facilitates the smooth transition between the two types of parallel tasks. No global synchronization is used in the algorithm. The algorithm is well suited for shared memory machines (SMP) with a modest number of processors. We demonstrate 4-7 fold speedups on a range of 8 processor SMPs, and more on larger SMPs. One realistic problem arising from a 3-D flow calculation achieves factorization rates of 1.0, 2.5, 0.8 and 0.8 Gigaflops, on the 12 processor Power Challenge, 8 processor Cray C90, 16 processor Cray J90, and 8 processor AlphaServer 8400 respectively.

[1]  E. Ng,et al.  Predicting structure in nonsymmetric sparse matrix factorizations , 1993 .

[2]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[3]  E. Rothberg,et al.  Parallel sparse Cholesky factorization algorithms for shared-memory multiprocessor systems , 1992 .

[4]  Vipin Kumar,et al.  Optimally Scalable Parallel Sparse Cholesky Factorization , 1995, PP.

[5]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[6]  Barry W. Peyton,et al.  A Supernodal Cholesky Factorization Algorithm for Shared-Memory Multiprocessors , 1991, SIAM J. Sci. Comput..

[7]  Joseph W. H. Liu,et al.  Exploiting Structural Symmetry in a Sparse Partial Pivoting Code , 1993, SIAM J. Sci. Comput..

[8]  Joseph W. H. Liu,et al.  Exploiting Structural Symmetry in Unsymmetric Sparse Symbolic Factorization , 1992, SIAM J. Matrix Anal. Appl..

[9]  Joseph W. H. Liu,et al.  On Finding Supernodes for Sparse Matrix Computations , 1993, SIAM J. Matrix Anal. Appl..

[10]  Edward Rothberg Performance of Panel and Block Approaches to Sparse Cholesky Factorization on the iPSC/860 and Paragon Multicomputers , 1996, SIAM J. Sci. Comput..

[11]  A. George,et al.  An Implementation of Gaussian Elimination with Partial Pivoting for Sparse Systems , 1985 .

[12]  T. Davis,et al.  A nondeterministic parallel algorithm for general unsymmetric sparse lu factorization , 1990 .

[13]  Alan George,et al.  Parallel sparse Gaussian elimination with partial pivoting , 1990 .

[14]  A. George,et al.  A data structure for sparse QR and LU factorizations , 1988 .

[15]  S. Vavasis Stable finite elements for problems with wild coefficients , 1996 .

[16]  A. George,et al.  Symbolic factorization for sparse Gaussian elimination with partial pivoting , 1987 .

[17]  Xiaoye Sherry Li,et al.  Sparse Gaussian Elimination on High Performance Computers , 1996 .

[18]  E. Ng,et al.  An E cient Algorithm to Compute Row andColumn Counts for Sparse Cholesky Factorization , 1994 .

[19]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[20]  James Demmel,et al.  A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..

[21]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[22]  Roger Grimes,et al.  The influence of relaxed supernode partitions on the multifrontal method , 1989, TOMS.

[23]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[24]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[25]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[26]  David M. Fenwick,et al.  The AlphaServer 8000 Series: High-end Server Platform Development , 1995, Digit. Tech. J..