Parallel Algorithms for Forward and Back Substitution in Direct Solution of Sparse Linear Systems

A few parallel algorithms for solving triangular systems resulting from parallel factorization of sparse linear systems have been proposed and implemented recently. We present a detailed analysis of parallel complexity and scalability of the best of these algorithms and the results of its implementation on up to 256 processors of the Cray T3D parallel computer. It has been a common belief that parallel sparse triangular solvers are quite unscalable due to a high communication to computation ratio. Our analysis and experiments show that, although not as scalable as the best parallel sparse Cholesky factorization algorithms, parallel sparse triangular solvers can yield reasonable speedups in runtime on hundreds of processors. We also show that for a wide class of problems, the sparse triangular solvers described in this paper are optimal and are asymptotically as scalable as a dense triangular solver.

[1]  R. Tarjan,et al.  A Separator Theorem for Planar Graphs , 1977 .

[2]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[3]  Joseph W. H. Liu The role of elimination trees in sparse factorization , 1990 .

[4]  Vipin Kumar,et al.  Parallel multilevel graph partitioning , 1996, Proceedings of International Conference on Parallel Processing.

[5]  Gary L. Miller,et al.  A unified geometric approach to graph separators , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[6]  Alan George,et al.  Communication results for parallel sparse Cholesky factorization on a hypercube , 1989, Parallel Comput..

[7]  Michael T. Heath,et al.  Parallel solution of triangular systems on distributed-memory multiprocessors , 1988 .

[8]  Vipin Kumar,et al.  Highly Scalable Parallel Algorithms for Sparse Matrix Factorization , 1997, IEEE Trans. Parallel Distributed Syst..

[9]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[10]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[11]  Alan George,et al.  Computer Solution of Large Sparse Positive Definite , 1981 .

[12]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[13]  D. Rose,et al.  Generalized nested dissection , 1977 .

[14]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .