A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution

This paper concerns latency-tolerant schemes for the efficient parallel solution of sparse triangular linear systems on distributed memory multiprocessors. Such triangular solution is required when sparse Cholesky factors are used to solve for a sequence of right-hand-side vectors or when incomplete sparse Cholesky factors are used to precondition a Conjugate Gradients iterative solver. In such applications, the use of traditional distributed substitution schemes can create a performance bottleneck when the latency of interprocessor communication is large. We had earlier developed the Selective Inversion (SI) scheme to reduce communication latency costs by replacing distributed substitution by parallel matrix vector multiplication. We now present a new two-way mapping of the triangular sparse matrix to processors to improve the performance of SI by halving its communication latency costs. We provide analytic results for model sparse matrices and we report on the performance of our scheme for parallel preconditioning with incomplete sparse Cholesky factors.

[1]  Robert Francis Lucas,et al.  Solving planar systems of equations on distributed-memory multiprocessors , 1988 .

[2]  Ivar Gustafsson,et al.  An incomplete factorization preconditioning method based on modification of element matrices , 1996 .

[3]  William Gropp,et al.  PETSc 2.0 users manual , 2000 .

[4]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[5]  T. Manteuffel An incomplete factorization technique for positive definite linear systems , 1980 .

[6]  Michael T. Heath,et al.  Parallel Algorithms for Sparse Linear Systems , 1991, SIAM Rev..

[7]  J. Meijerink,et al.  An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .

[8]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[9]  M. A. Ajiz,et al.  A robust incomplete Choleski‐conjugate gradient algorithm , 1984 .

[10]  Michael T. Heath,et al.  Symbolic Cholesky factorization on a local-memory multiprocessor , 1987, Parallel Comput..

[11]  Barry W. Peyton,et al.  A Blocked Incomplete Cholesky Preconditioner For Hierarchical-Memory Computers , 1999 .

[12]  Michael T. Heath,et al.  Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .

[13]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[14]  Michele Benzi,et al.  A Sparse Approximate Inverse Preconditioner for the Conjugate Gradient Method , 1996, SIAM J. Sci. Comput..

[15]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[16]  Marcus J. Grote,et al.  Parallel Preconditioning with Sparse Approximate Inverses , 1997, SIAM J. Sci. Comput..

[17]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .

[18]  Michael T. Heath,et al.  Performance of Parallel Sparse Triangular Solution , 1999 .

[19]  YereminA. Yu.,et al.  Factorized sparse approximate inverse preconditionings I , 1993 .

[20]  Chih-Jen Lin,et al.  Incomplete Cholesky Factorizations with Limited Memory , 1999, SIAM J. Sci. Comput..

[21]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[22]  Vipin Kumar,et al.  A Scalable Parallel Algorithm for Sparse Matrix Factorization , 1994 .

[23]  L. Kolotilina,et al.  Factorized Sparse Approximate Inverse Preconditionings I. Theory , 1993, SIAM J. Matrix Anal. Appl..

[24]  Prabhakar Raghavan,et al.  DSCPACK: Domain-Separator Codes for the parallel solution of sparse linear systems , 2002 .

[25]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[26]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[27]  Matthias Bollhöfer,et al.  A robust ILU with pivoting based on monitoring the growth of the inverse factors , 2001 .

[28]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[29]  Joseph W. H. Liu,et al.  On Finding Supernodes for Sparse Matrix Computations , 1993, SIAM J. Matrix Anal. Appl..

[30]  Padma Raghavan,et al.  Efficient Parallel Sparse Triangular Solution Using Selective Inversion , 1998, Parallel Process. Lett..