Design, implementation and testing of extended and mixed precision BLAS

This article describes the design rationale, a C implementation, and conformance testing of a subset of the new Standard for the BLAS (Basic Linear Algebra Subroutines): Extended and Mixed Precision BLAS. Permitting higher internal precision and mixed input/output types and precisions allows us to implement some algorithms that are simpler, more accurate, and sometimes faster than possible without these features. The new BLAS are challenging to implement and test because there are many more subroutines than in the existing Standard, and because we must be able to assess whether a higher precision is used for internal computations than is used for either input or output variables. We have therefore developed an automated process of generating and systematically testing these routines. Our methodology is applicable to languages besides C. In particular, our algorithms used in the testing code will be valuable to all other BLAS implementors. Our extra precision routines achieve excellent performance---close to half of the machine peak Megaflop rate even for the Level 2 BLAS, when the data access is stride one.

[1]  Jack J. Dongarra,et al.  Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs , 1988, TOMS.

[2]  S. Eisenstat,et al.  Variational Iterative Methods for Nonsymmetric Systems of Linear Equations , 1983 .

[3]  J. Uhlig C. Forsythe and C. B. Moler, Computer Solution of Linear Algebraic Systems. (Series in Automatic Computation) XI + 148 S. Englewood Cliffs, N.J. 1967. Prentice-Hall, Inc. Preis geb. 54 s. net , 1972 .

[4]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[5]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[6]  Inderjit S. Dhillon,et al.  Current inverse iteration software can fail , 1998 .

[7]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[8]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[9]  Kathryn Turner,et al.  Efficient High Accuracy Solutions with GMRES(m) , 1992, SIAM J. Sci. Comput..

[10]  James Demmel,et al.  Making Sparse Gaussian Elimination Scalable by Static Pivoting , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  Keith O. Geddes,et al.  Maple V Programming Guide , 1996 .

[12]  IEEE standard for radix-independent floating-point arithmetic - IEEE standard 854-1987 , 1987 .

[13]  I. Dhillon Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem , 1998 .

[14]  T. J. Dekker,et al.  A floating-point technique for extending the available precision , 1971 .

[15]  William Kahan,et al.  Document for the Basic Linear Algebra Subprograms (BLAS) standard: BLAS Technical Forum , 2001 .

[16]  G. Golub,et al.  Gmres: a Generalized Minimum Residual Algorithm for Solving , 2022 .

[17]  Siegfried M. Rump,et al.  ACRITH: High-Accuracy Arithmetic an advanced tool for numerical computation , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[18]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[19]  W. Kahan Accurate eigenvalues of a symmetric tri-diagonal matrix , 1966 .

[20]  W. Prager,et al.  Compatibility of approximate solution of linear equations with given error bounds for coefficients and right-hand sides , 1964 .

[21]  Brian W. Kernighan,et al.  The m4 macro processor , 1977 .

[22]  Inderjit S. Dhillon,et al.  Application of a New Algorithm for the Symmetric Eigenproblem to Computational Quantum Chemistry , 1997, PPSC.

[23]  Brian T. Smith,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[24]  David H. Bailey,et al.  A Fortran 90-based multiprecision system , 1995, TOMS.

[25]  Joseph D. Darcy,et al.  How Java’s Floating-Point Hurts Everyone Everywhere , 2004 .

[26]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[27]  Ole Møller Quasi double-precision in floating point addition , 1965 .

[28]  Inderjit S. Dhillon,et al.  Fernando's solution to Wilkinson's problem: An application of double factorization , 1997 .

[29]  James Demmel,et al.  Faster Numerical Algorithms via Exception Handling , 1994, IEEE Trans. Computers.

[30]  Jack J. Dongarra,et al.  A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.

[31]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[32]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[33]  Jonathan Richard Shewchuk,et al.  Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates , 1997, Discret. Comput. Geom..

[34]  Beresford N. Parlett,et al.  An implementation of the dqds algorithm (positive case) , 2000 .

[35]  Douglas M. Priest,et al.  Algorithms for arbitrary precision floating point arithmetic , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[36]  M. Pichat,et al.  Correction d'une somme en arithmetique a virgule flottante , 1972 .

[37]  William Kahan,et al.  Anomalies in the IBM ACRITH package , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[38]  Richard P. Brent,et al.  Recent technical reports , 1977, SIGA.

[39]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[40]  Gene H. Golub,et al.  Matrix computations , 1983 .

[41]  Lawrence S. Kroll Mathematica--A System for Doing Mathematics by Computer. , 1989 .

[42]  B. S. Garbow,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[43]  V. Klema LINPACK user's guide , 1980 .

[44]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[45]  G. Forsythe,et al.  Computer solution of linear algebraic systems , 1969 .

[46]  James Demmel Underflow and the Reliability of Numerical Software , 1984 .

[47]  G. Forsythe,et al.  Computer solution of linear algebraic systems , 1969 .

[48]  Takashi Hattori,et al.  A New Approach to Scientific Computing with JavaSpace , 2003 .