On Parallel Numerical Software Libraries

We survey the design principles for writing numerical linear algebra software for high performance computers, especially parallel computers. After reviewing the architectural features of modern computers, we show how to exploit them in the design of a dense Gaussian elimination subroutine. We use software from the LAPACK and ScaLAPACK projects to illustrate.

[1]  PattersonDavid,et al.  LogP: towards a realistic model of parallel computation , 1993 .

[2]  James Demmel,et al.  Stability of block algorithms with fast level-3 BLAS , 1992, TOMS.

[3]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[4]  Jack J. Dongarra,et al.  Matrix Eigensystem Routines — EISPACK Guide Extension , 1977, Lecture Notes in Computer Science.

[5]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[6]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[7]  Jack Dongarra,et al.  The design of linear algebra libraries for high performance computers , 1993 .

[8]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[9]  Andrew Davison,et al.  Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers , 1995 .

[10]  James Demmel,et al.  Parallel numerical linear algebra , 1993, Acta Numerica.

[11]  Michael T. Heath,et al.  Performance of a Fully Parallel Sparse Solver , 1997, Int. J. High Perform. Comput. Appl..

[12]  Brian T. Smith,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[13]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[14]  James Demmel,et al.  Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I , 1993, PPSC.

[15]  Danny C. Sorensen,et al.  Implicit Application of Polynomial Filters in a k-Step Arnoldi Method , 1992, SIAM J. Matrix Anal. Appl..

[16]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[17]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[18]  Nicholas J. Higham,et al.  Exploiting fast matrix multiplication within the level 3 BLAS , 1990, TOMS.

[19]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[20]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[21]  Jack J. Dongarra,et al.  Performance of various computers using standard linear equations software in a FORTRAN environment , 1988, CARN.

[22]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[23]  Jaeyoung Choi,et al.  Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..

[24]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.