On properties of floating point arithmetics: numerical stability and the cost of accurate computations

Floating point arithmetics generally possess many regularity properties in addition to those that are typically used in roundoff error analyses; these properties can be exploited to produce computations that are more accurate and cost effective than many programmers might think possible. Furthermore, many of these properties are quite simple to state and to comprehend, but few programmers seem to be aware of them (or at least willing to rely on them). This dissertation presents some of these properties and explores their consequences for computability, accuracy, cost, and portability. For example, we consider several algorithms for summing a sequence of numbers and show that under very general hypotheses, we can compute a sum to full working precision at only somewhat greater cost than a simple accumulation, which can often produce a sum with no significant figures at all. This example, as well as others we present, can be generalized further by substituting still more complex algorithms; consequently, examples such as these oblige us to consider more carefully the tradeoffs between cost and accuracy. At one end of the accuracy spectrum we find one of the least obvious consequences of the properties of floating point arithmetic: the accuracy of a computation consisting of rational arithmetic operations and comparisons need not be limited by the precision of the floating point arithmetic in which it is carried out. Of course, the more accuracy desired, the greater the cost of the computation, and the cost of computing a very accurate result may be quite high; we illustrate this possibility in the case of polynomial evaluation. At the other end of the spectrum, however, we give an example of a problem for which simply computing a result to a modest guaranteed accuracy costs far less than the contortions required to accommodate inaccurate results. As a consequence of examples such as these, we conclude that programmers and theorists alike must be willing to adopt a more sophisticated view of floating point arithmetic, even if only to consider that more accurate and reliable computations than those presently in common use might be possible based on stronger hypotheses than are customarily assumed.

[1]  Xiaomei Yang Rounding Errors in Algebraic Processes , 1964, Nature.

[2]  Ole Møller Quasi double-precision in floating point addition , 1965 .

[3]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[4]  J. Rice Experiments on Gram-Schmidt orthogonalization , 1966 .

[5]  A. vanWijngaarden,et al.  Numerical analysis as an independent science : (bit, nordisk tidskrift for informations-behandling, _6(1966), p 66-81) , 1966 .

[6]  A. van Wijngaarden,et al.  Numerical analysis as an independent science , 1966 .

[7]  C. Witzgall,et al.  Stable evaluation of polynomials. , 1967 .

[8]  Duane A. Adams,et al.  A stopping criterion for polynomial root finding , 1967, Commun. ACM.

[9]  E. Bareiss Sylvester’s identity and multistep integer-preserving Gaussian elimination , 1968 .

[10]  T. J. Dekker,et al.  A floating-point technique for extending the available precision , 1971 .

[11]  Michael A. Malcolm,et al.  Algorithms to reveal properties of floating-point arithmetic , 1972, CACM.

[12]  M. Pichat,et al.  Correction d'une somme en arithmetique a virgule flottante , 1972 .

[13]  Pat H. Sterbenz,et al.  Floating-point computation , 1973 .

[14]  Niklaus Wirth,et al.  Systematic Programming: An Introduction , 1974 .

[15]  S. Linnainmaa Analysis of some known methods of improving the accuracy of floating-point sums , 1974 .

[16]  F. L. Bauer Computational Graphs and Rounding Error , 1974 .

[17]  W. Morven Gentleman,et al.  More on algorithms that reveal properties of floating point arithmetic units , 1974, CACM.

[18]  Gerd Bohlender,et al.  Floating-point computation of functions with maximum accuracy , 1977, 1975 IEEE 3rd Symposium on Computer Arithmetic (ARITH).

[19]  Gerd Bohlender,et al.  Floating-Point Computation of Functions with Maximum Accuracy , 1975, IEEE Transactions on Computers.

[20]  William Kahan,et al.  Can You Count on Your Calculator. , 1977 .

[21]  R. Brent A Fortran Multiple-Precision Arithmetic Package , 1978, TOMS.

[22]  Fred D. Crary A Versatile Precompiler for Nonstandard Arithmetics , 1979, TOMS.

[23]  John Erick Holm,et al.  Floating-Point Arithmetic and Program Correctness Proofs , 1980 .

[24]  W. S. Brown A Simple but Realistic Model of Floating-Point Computation , 1981, TOMS.

[25]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 2 , 1981 .

[26]  Willard L. Miranker,et al.  Computer arithmetic in theory and practice , 1981, Computer science and applied mathematics.

[27]  Seppo Linnainmaa,et al.  Software for Doubled-Precision Floating-Point Computations , 1981, TOMS.

[28]  Harald Böhm,et al.  Evaluation of Arithmetic Expressions with Maximum Accuracy , 1983, IMACS World Congress.

[29]  Michael Ben-Or,et al.  Lower bounds for algebraic computation trees , 1983, STOC.

[30]  James Demmel Underflow and the Reliability of Numerical Software , 1984 .

[31]  Michael Clemmesen,et al.  Interval arithmetic implementations: using floating point arithmetic , 1984, SGNM.

[32]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[33]  Thomas Ottmann,et al.  Numerical Stability of simple Geometric Algorithms in the plane , 1987, Computation Theory and Logic.

[34]  Victor J. Milenkovic,et al.  Verifiable Implementations of Geometric Algorithms Using Finite Precision Arithmetic , 1989, Artif. Intell..

[35]  David P. Dobkin,et al.  Recipes for geometry and numerical analysis - Part I: an empirical study , 1988, SCG '88.

[36]  R. Lohner Precise evaluation of polynomials in several variables , 1988 .

[37]  William J. Cody,et al.  Algorithm 665: Machar: a subroutine to dynamically determined machine parameters , 1988, TOMS.

[38]  Charles Farnum,et al.  Compiler support for floating‐point computation , 1988, Softw. Pract. Exp..

[39]  S. Smale,et al.  On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[40]  Victor J. Milenkovic,et al.  Double precision geometry: a general technique for calculating line and segment intersections using rounded arithmetic , 1989, 30th Annual Symposium on Foundations of Computer Science.

[41]  Christian P. Ullrich,et al.  Computer Arithmetic and Self-Validating Numerical Methods , 1990, Notes and reports in mathematics in science and engineering.

[42]  Stephen Smale,et al.  Some Remarks on the Foundations of Numerical Analysis , 1990, SIAM Rev..

[43]  D. Sorensen,et al.  On the orthogonality of eigenvectors computed by divide-and-conquer techniques , 1991 .

[44]  W. Kahan Analysis and refutation of the LCAS , 1991, SGNM.

[45]  David M. Smith,et al.  Algorithm 693: a FORTRAN package for floating-point multiple-precision arithmetic , 1991, TOMS.

[46]  Douglas M. Priest,et al.  Algorithms for arbitrary precision floating point arithmetic , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[47]  Russell Carter,et al.  Y-MP Floating Point and Cholesky Factorization , 1991, Int. J. High Speed Comput..

[48]  David Goldberg What Every Computer Scientist Should Know About Floating-Point Arithmetic , 1992 .

[49]  Nicholas J. Higham,et al.  The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..