The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines

This paper describes the design and implementation of three core LU, QR and Cholesky factorization routines included in the out-of-core extension of ScaLAPACK. These routines allow the factorization and solution of a very large dense system that is too large to t entirely in physical memory. An image of the full matrix is maintained on disk and the factorization routines transfer sub-matrices to be operated in memory. A `left-looking' column-oriented variant of the factorization algorithm is implemented to reduce the disk I/O tra c. The routines are implemented using a portable I/O interface and uses high performance ScaLAPACK factorization routines as in-core computational kernels. We present the details of the implementation of the out-of-core ScaLAPACK factorization routines as well as performance and scalability results on the Intel Paragon.

[1]  Jack Dongarra,et al.  Environments and Tools for Parallel Scientific Computing , 1993 .

[2]  L. J. Gray,et al.  PVM implementation of the symmetric-Galerkin method , 1997 .

[3]  Erwin Frederick Jaeger,et al.  Second-order radio frequency kinetic theory with applications to flow drive and heating in tokamak plasmas , 2000 .

[4]  Robert A. van de Geijn,et al.  Anatomy of a Parallel Out-of-Core Dense Linear Solver , 1995, ICPP.

[5]  J. T. Oden,et al.  Massively parallel computation for acoustical scattering problems using boundary element methods , 1996 .

[6]  R. V. D. Geijn,et al.  A fast solution method for three‐dimensional many‐particle problems of linear elasticity , 1998 .

[7]  Jack Dongarra,et al.  Key concepts for parallel out-of-core LU factorization , 1998 .

[8]  Jaeyoung Choi,et al.  Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..

[9]  Sivan Toledo Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..

[10]  Erwin Frederick Jaeger,et al.  Full-wave calculation of sheared poloidal flow driven by high-harmonic ion Bernstein waves in tokamak plasmas , 2000 .

[11]  Leszek Demkowicz,et al.  Solution of elastic scattering problems in linear acoustics using h-p boundary element method , 1992 .

[12]  S. Lennart Johnsson,et al.  Load-Balanced LU and QR Factor and Solve Routines for Scalable Processors with Scalable I/O , 1994 .

[13]  Robert A. van de Geijn,et al.  Two Dimensional Basic Linear Algebra Communication Subprograms , 1993, PPSC.

[14]  Robert A. van de Geijn,et al.  POOCLAPACK: Parallel Out-of-Core Linear Algebra Package , 1999 .

[15]  Jaeyoung Choi,et al.  A Proposal for a Set of Parallel Basic Linear Algebra Subprograms , 1995, PARA.

[16]  Jack J. Dongarra,et al.  Key Concepts for Parallel Out-of-Core LU Factorization , 1996, Parallel Comput..

[17]  Sivan Toledo,et al.  The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations , 1996, IOPADS '96.

[18]  Jack J. Dongarra,et al.  Algorithmic Redistribution Methods for Block-Cyclic Decompositions , 1999, IEEE Trans. Parallel Distributed Syst..

[19]  D. S. Scott Out of core dense solvers on Intel parallel supercomputers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[20]  Erwin Frederick Jaeger,et al.  Wave-Induced Momentum Transport and Flow Drive in Tokamak Plasmas , 1999 .