The design library: of a parallel dense linear algebra software Reduction to Hessenberg, tridiagonal, and bidiagonal form*

This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. These routines are important in the solution of eigenproblems. The paper focuses on how building blocks are used to create higher-level library routines. Results are presented that demonstrate the scalability of the reduction routines. The most commonly-used building blocks used in ScaLAPACK are the sequencing BLAS, the parallel BLAS (PBLAS) and the Basic Linear Algebra Communication Subprograms (BLACS). Each of the matrix reduction algorithms consists of a series of steps in each of which one block column (or panel), and/or block row, of the matrix is reduced, followed by an update of the portion of the matrix that has not been factorized so far. This latter phase is performed using Level 3 PBLAS operations and contains the bulk of the computation. However, the panel reduction phase involves a significant amount of communication, and is important in determining the scalability of the algorithm. The simplest way to parallelize the panel reduction phase is to replace the BLAS routines appearing in the LAPACK routine (mostly matrixvector and matrix-matrix multiplications) with the corresponding PBLAS routines. However, in some cases it is possible to reduce communication startup costs by performing the communication necessary for consecutive BLAS operations in a single communication using a BLACS call. Thus, there is a tradeoff between efficiency and software engineering considerations, such as ease of programming and simplicity of code.

[1]  Jack Dongarra,et al.  PB-BLAS: a set of Parallel Block Basic Linear Algebra Subprograms , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[2]  Jack J. Dongarra,et al.  Software Libraries for Linear Algebra Computations on High Performance Computers , 1995, SIAM Rev..

[3]  Jaeyoung Choi,et al.  The design of scalable software libraries for distributed memory concurrent computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[4]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[5]  Jack Dongarra,et al.  LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[6]  Robert A. van de Geijn,et al.  Reduction to condensed form for the eigenvalue problem on distributed memory architectures , 1992, Parallel Comput..

[7]  D. Sorensen,et al.  LAPACK Working Note No. 2: Block reduction of matrices to condensed forms for eigenvalue computations , 1987 .

[8]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[9]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[10]  S. Lennart Johnsson,et al.  Block-Cyclic Dense Linear Algebra , 1993, SIAM J. Sci. Comput..

[11]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[12]  Eric F. van de Velde Data redistribution and concurrency , 1990, Parallel Comput..

[13]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[14]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[15]  R. van de Geijn,et al.  A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..