Fast and Accurate Floating Point Summation with Application to Computational Geometry

We present several simple algorithms for accurately computing the sum of n floating point numbers using a wider accumulator. Let f and F be the number of significant bits in the summands and the accumulator, respectively. Then assuming gradual underflow, no overflow, and round-to-nearest arithmetic, up to ⌊2F−f/(1−2−f)⌋+1 numbers can be accurately added by just summing the terms in decreasing order of exponents, yielding a sum correct to within about 1.5 units in the last place. In particular, if the sum is zero, it is computed exactly. We apply this result to the floating point formats in the IEEE floating point standard, and investigate its performance. Our results show that in the absence of massive cancellation (the most common case) the cost of guaranteed accuracy is about 30–40% more than the straightforward summation. If massive cancellation does occur, the cost of computing the accurate sum is about a factor of ten. Finally, we apply our algorithm in computing a robust geometric predicate (used in computational geometry), where our accurate summation algorithm improves the existing algorithm by a factor of two on a nearly coplanar set of points.

[1]  Willard L. Miranker,et al.  Computer arithmetic in theory and practice , 1981, Computer science and applied mathematics.

[2]  D. R. Ross Reducing truncation errors using cascading accumulators , 1965, CACM.

[3]  Douglas M. Priest,et al.  Algorithms for arbitrary precision floating point arithmetic , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[4]  Nicholas J. Higham,et al.  The Accuracy of Floating Point Summation , 1993, SIAM J. Sci. Comput..

[5]  Douglas M. Priest On properties of floating point arithmetics: numerical stability and the cost of accurate computations , 1992 .

[6]  Seppo Linnainmaa,et al.  Software for Doubled-Precision Floating-Point Computations , 1981, TOMS.

[7]  Michael A. Malcolm,et al.  On accurate floating-point summation , 1971, CACM.

[8]  Ansi Ieee,et al.  IEEE Standard for Binary Floating Point Arithmetic , 1985 .

[9]  Wilhelm Oberaigner,et al.  Parallel algorithms for the rounding exact summation of floating point numbers , 1982, Computing.

[10]  Gerd Bohlender,et al.  Floating-Point Computation of Functions with Maximum Accuracy , 1975, IEEE Transactions on Computers.

[11]  Jack M. Wolfe Reducing truncation errors by programming , 1964, CACM.

[12]  Jonathan Richard Shewchuk,et al.  Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates , 1997, Discret. Comput. Geom..

[13]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[14]  Mei Han An,et al.  accuracy and stability of numerical algorithms , 1991 .

[15]  T. J. Dekker,et al.  A floating-point technique for extending the available precision , 1971 .

[16]  Ole Møller Quasi double-precision in floating point addition , 1965 .

[17]  Ulrich W. Kulisch,et al.  Formalization and implementation of floating-point matrix operations , 2005, Computing.

[18]  M. Pichat,et al.  Correction d'une somme en arithmetique a virgule flottante , 1972 .