SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

We present the main algorithmic features in the software package SuperLU_DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with a focus on scalability issues, and demonstrate the software's parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication patterns, which lets us exploit techniques used in parallel sparse Cholesky algorithms to better parallelize both LU decomposition and triangular solution on large-scale distributed machines.

[1]  H. Markowitz The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[2]  Isaacs,et al.  Collisional breakup in a quantum system of three charged particles , 1999, Science.

[3]  Vipin Kumar,et al.  Optimally Scalable Parallel Sparse Cholesky Factorization , 1995, PP.

[4]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[5]  Timothy A. Davis,et al.  A combined unifrontal/multifrontal method for unsymmetric sparse matrices , 1999, TOMS.

[6]  J. Gilbert Predicting Structure in Sparse Matrix Computations , 1994 .

[7]  Wolfgang Fichtner,et al.  Efficient Sparse LU Factorization with Left-Right Looking Strategy on Shared Memory Multiprocessors , 2000 .

[8]  Patrick Amestoy,et al.  An Unsymmetrized Multifrontal LU Factorization , 2000, SIAM J. Matrix Anal. Appl..

[9]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[10]  James Demmel,et al.  A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..

[11]  A. Neumaier,et al.  A NEW PIVOTING STRATEGY FOR GAUSSIAN ELIMINATION , 1996 .

[12]  James Demmel,et al.  Design, implementation and testing of extended and mixed precision BLAS , 2000, TOMS.

[13]  Abhishek Kumar Gupta,et al.  Wsmp: watson sparse matrix package , 2000 .

[14]  Patrick R. Amestoy,et al.  Analysis and comparison of two general sparse solvers for distributed memory computers , 2001, TOMS.

[15]  A. George,et al.  Symbolic factorization for sparse Gaussian elimination with partial pivoting , 1987 .

[16]  G. Golub,et al.  Gmres: a Generalized Minimum Residual Algorithm for Solving , 2022 .

[17]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[18]  Patrick Amestoy,et al.  Memory Management Issues in Sparse Multifrontal Methods On Multiprocessors , 1993, Int. J. High Perform. Comput. Appl..

[19]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[20]  A. George Nested Dissection of a Regular Finite Element Mesh , 1973 .

[21]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .

[22]  R. Schreiber,et al.  Highly Parallel Sparse Triangular Solution , 1994 .

[23]  Padma Raghavan,et al.  Efficient Parallel Sparse Triangular Solution Using Selective Inversion , 1998, Parallel Process. Lett..

[24]  Michael T. Heath,et al.  Performance of a Fully Parallel Sparse Solver , 1997, Int. J. High Perform. Comput. Appl..

[25]  Edward Rothberg Performance of Panel and Block Approaches to Sparse Cholesky Factorization on the iPSC/860 and Paragon Multicomputers , 1996, SIAM J. Sci. Comput..

[26]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[27]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[28]  Joseph W. H. Liu,et al.  Elimination Structures for Unsymmetric Sparse $LU$ Factors , 1993, SIAM J. Matrix Anal. Appl..

[29]  James Demmel,et al.  An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination , 1997, SIAM J. Matrix Anal. Appl..

[30]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[31]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[32]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[33]  Cleve Ashcraft,et al.  SPOOLES: An Object-Oriented Sparse Matrix Library , 1999, PPSC.

[34]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[35]  Tao Yang,et al.  Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures , 1998, IEEE Trans. Parallel Distributed Syst..

[36]  Timothy A. Davis,et al.  A column approximate minimum degree ordering algorithm , 2000, TOMS.

[37]  Iain S. Duff,et al.  The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices , 1999, SIAM J. Matrix Anal. Appl..

[38]  James Demmel,et al.  Making Sparse Gaussian Elimination Scalable by Static Pivoting , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[39]  Iain S. Duff,et al.  Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .

[40]  Anshul Gupta,et al.  Improved Symbolic and Numerical Factorization Algorithms for Unsymmetric Sparse Matrices , 2002, SIAM J. Matrix Anal. Appl..

[41]  A. George,et al.  A data structure for sparse QR and LU factorizations , 1988 .

[42]  Michele Benzi,et al.  Preconditioning Highly Indefinite and Nonsymmetric Matrices , 2000, SIAM J. Sci. Comput..

[43]  Patrick R. Amestoy,et al.  An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..

[44]  James Demmel,et al.  Preconditioning sparse matrices for computing eigenvalues and solving linear systems of equations , 2001 .

[45]  Joseph W. H. Liu,et al.  Modification of the minimum-degree algorithm by multiple elimination , 1985, TOMS.

[46]  Xiaoye Li,et al.  Solution of a Three-Body Problem in Quantum Mechanics Using Sparse Linear Algebra on Parallel Computers , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[47]  Joseph W. H. Liu,et al.  Robust Ordering of Sparse Matrices using Multisection , 1998 .

[48]  Mark T. Jones,et al.  Scalable Iterative Solution of Sparse Linear Systems , 1994, Parallel Comput..

[49]  Pascal Hénon,et al.  A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization , 1999, Euro-Par.

[50]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[51]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..