Sparse matrix factorization on massively parallel computers

Direct methods for solving sparse systems of linear equations have a high asymptotic computational and memory requirements relative to iterative methods. However, systems arising in some applications, such as structural analysis, can often be too ill-conditioned for iterative solvers to be effective. We cite real applications where this is indeed the case, and using matrices extracted from these applications to conduct experiments on three different massively parallel architectures, show that a well designed sparse factorization algorithm can attain very high levels of performance and scalability. We present strong scalability results for test data from real applications on up to 8,192 cores, along with both analytical and experimental weak scalability results for a model problem on up to 16,384 cores---an unprecedented number for sparse factorization. For the model problem, we also compare experimental results with multiple analytical scaling metrics and distinguish between some commonly used weak scaling methods.

[1]  Vipin Kumar,et al.  WSSMP: A High-Performance Serial and Parallel Symmetric Sparse Linear Solver , 1998, PARA.

[2]  D. Rose,et al.  Generalized nested dissection , 1977 .

[3]  Anshul Gupta A Shared- and distributed-memory parallel general sparse direct solver , 2007, Applicable Algebra in Engineering, Communication and Computing.

[4]  Jürgen Schulze Towards a Tighter Coupling of Bottom-Up and Top-Down Sparse Matrix Ordering Methods , 2001 .

[5]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..

[6]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[7]  Pierre Ramet,et al.  Dynamic scheduling for sparse direct solver on NUMA architectures , 2008 .

[8]  Sivan Toledo,et al.  An Assessment of Incomplete-LU Preconditioners for Nonsymmetric Linear Systems , 2000, Informatica.

[9]  Anshul Gupta,et al.  Fast and effective algorithms for graph partitioning and sparse-matrix ordering , 1997, IBM J. Res. Dev..

[10]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[11]  Patrick R. Amestoy,et al.  Multifrontal parallel distributed symmetric and unsymmetric solvers , 2000 .

[12]  Pascal Hénon,et al.  PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems , 2002, Parallel Comput..

[13]  Patrick H. Worley,et al.  The Effect of Time Constraints on Scaled Speedup , 1990, SIAM J. Sci. Comput..

[14]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[15]  Patrick R. Amestoy,et al.  An Approximate Minimum Degree Ordering Algorithm , 1996, SIAM J. Matrix Anal. Appl..

[16]  Robert E. Benner,et al.  Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .

[17]  A. Gupta,et al.  Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[18]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[19]  Vipin Kumar,et al.  Parallel depth first search. Part II. Analysis , 1987, International Journal of Parallel Programming.

[20]  Alex Pothen,et al.  A Scalable Parallel Algorithm for Incomplete Factor Preconditioning , 2000, SIAM J. Sci. Comput..

[21]  Edmond Chow,et al.  Parallel Implementation and Practical Use of Sparse Approximate Inverse Preconditioners with a Priori Sparsity Patterns , 2001, Int. J. High Perform. Comput. Appl..

[22]  Xian-He Sun,et al.  Toward a better parallel performance metric , 1991, Parallel Comput..

[23]  James Demmel,et al.  Making Sparse Gaussian Elimination Scalable by Static Pivoting , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[24]  Lionel M. Ni,et al.  Scalable Problems and Memory-Bounded Speedup , 1993, J. Parallel Distributed Comput..

[25]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[26]  Xian-He Sun,et al.  Scalability of Parallel Algorithm-Machine Combinations , 1994, IEEE Trans. Parallel Distributed Syst..

[27]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .

[28]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[29]  Lionel M. Ni,et al.  Another view on parallel speedup , 1990, Proceedings SUPERCOMPUTING '90.

[30]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[31]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[32]  James Demmel,et al.  A Scalable Sparse Direct Solver Using Static Pivoting , 1999, PPSC.

[33]  John K. Reid,et al.  The Multifrontal Solution of Indefinite Sparse Symmetric Linear , 1983, TOMS.

[34]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..

[35]  Vipin Kumar,et al.  A high performance two dimensional scalable parallel algorithm for solving sparse triangular systems , 1997, Proceedings Fourth International Conference on High-Performance Computing.