Exploiting the Performance of 32 bit Floating Point Arithmetic in Obtaining 64 bit Accuracy (Revisiting Iterative Refinement for Linear Systems)

Recent versions of microprocessors exhibit performance characteristics for 32 bit floating point arithmetic (single precision) that is substantially higher than 64 bit floating point arithmetic (double precision). Examples include the Intel's Pentium IV and M processors, AMD's Opteron architectures and the IBM's Cell Broad Engine processor. When working in single precision, floating point operations can be performed up to two times faster on the Pentium and up to ten times faster on the Cell over double precision. The performance enhancements in these architectures are derived by accessing extensions to the basic architecture, such as SSE2 in the case of the Pentium and the vector functions on the IBM Cell. The motivation for this paper is to exploit single precision operations whenever possible and resort to double precision at critical stages while attempting to provide the full double precision results. The results described here are fairly general and can be applied to various problems in linear algebra such as solving large sparse systems, using direct or iterative methods and some eigenvalue problems. There are limitations to the success of this process, such as when the conditioning of the problem exceeds the reciprocal of the accuracy of the single precision computations. In that case the double precision algorithm should be used

[1]  J. H. Wilkinson,et al.  IMPROVING THE ACCURACY OF COMPUTED EIGENVALUES AND EIGENVECTORS , 1983 .

[2]  James Demmel,et al.  Error bounds from extra-precise iterative refinement , 2006, TOMS.

[3]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[4]  Cleve B. Moler,et al.  Iterative Refinement in Floating Point , 1967, JACM.

[5]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[6]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[7]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[8]  Robert Strzodka,et al.  Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[9]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[10]  J. Dongarra Improving the Accuracy of Computed Singular Values , 1983 .

[11]  G. Stewart Introduction to matrix computations , 1973 .

[12]  Keith O. Geddes,et al.  Exploiting fast hardware floating point in high precision computation , 2003, ISSAC '03.

[13]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[14]  Jack J. Dongarra,et al.  Algorithm 710: FORTRAN subroutines for computing the eigenvalues and eigenvectors of a general matrix by reduction to general tridiagonal form , 1990, TOMS.

[15]  Jack J. Dongarra,et al.  Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues , 1982, TOMS.