Parallel Cholesky factorization of a block tridiagonal matrix

We discuss the parallel implementation of the Cholesky factorization of a positive definite symmetric matrix when that matrix is block tridiagonal. While parallel implementations for this problem, and closely related problems like the factorization of banded matrices, have been previously reported in the literature, those implementations dealt with the special cases where the block size (bandwidth) was either very large (wide) or very small (narrow). We present a solution that can be used for the entire spectrum of cases, ranging from extremely large (wide) to very small (narrow). Preliminary performance results collected on a Cray T3E-600 distributed memory supercomputer show that our implementation attains respectable performance. Indeed, factorization of a matrix with block size b=1000 and a total dimension of more than 500,000 takes about 3.6 minutes on 128 processors.

[1]  Jack Dongarra,et al.  Implementation in ScaLAPACK of Divide-and-Conquer Algorithms forBanded and Tridiagonal Linear Systems , 1997 .

[2]  Robert A. van de Geijn,et al.  Parallel implementation of BLAS: general techniques for Level 3 BLAS , 1995, Concurrency Practice and Experience.

[3]  Andrew James Cleary Algorithms for solving narrowly-branded linear systems on parallel computers by direct methods , 1991 .

[4]  Robert A. van de Geijn,et al.  Parallel implementation of BLAS: general techniques for Level 3 BLAS , 1995, Concurr. Pract. Exp..

[5]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[6]  Jack Dongarra,et al.  A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems , 1999, Scalable Comput. Pract. Exp..

[7]  Jack J. Dongarra,et al.  A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems , 1999, Scalable Comput. Pract. Exp..

[8]  Robert A. van de Geijn,et al.  Using PLAPACK - parallel linear algebra package , 1997 .

[9]  Peter Arbenz,et al.  On Experiments with a Parallel Direct Solver for Diagonally Dominant Banded Linear Systems , 1996, Euro-Par, Vol. II.

[10]  S. Johnsson Solving tridiagonal systems on ensemble architectures , 1987 .

[11]  Jack J. Dongarra,et al.  Solving banded systems on a parallel processor , 1987, Parallel Comput..

[12]  Barry F. Smith,et al.  Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations , 1996 .

[13]  Robert A. van de Geijn,et al.  PLAPACK: high performance through high-level abstraction , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[14]  Robert A. van de Geijn,et al.  Scalability Issues Affecting the Design of a Dense Linear Algebra Library , 1994, J. Parallel Distributed Comput..

[15]  Sivan Toledo,et al.  The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers , 1998, TOMS.

[16]  Robert A. van de Geijn,et al.  A flexible class of parallel matrix multiplication algorithms , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[17]  Duncan H. Lawrie,et al.  The computation and communication complexity of a parallel banded system solver , 1984, TOMS.

[18]  Stephen J. Wright,et al.  Parallel Algorithms for Banded Linear Systems , 1991, SIAM J. Sci. Comput..