Floating-point arithmetic

Roughly speaking, floating-point (FP) arithmetic is the way numerical quantities are handled by the computer. Many different programs rely on FP computations such as control software, weather forecasts, and hybrid systems (embedded systems mixing continuous and discrete behaviors). FP arithmetic corresponds to scientific notation with a limited number of digits for the integer significand. On modern processors, it is specified by the IEEE-754 standard which defines formats, attributes and roundings, exceptional values, and exception handling. FP arithmetic lacks several basic properties of real arithmetic; for example, addition is not associative. FP arithmetic is therefore often considered as strange and unintuitive. This chapter presents some basic knowledge about FP arithmetic, including numbers and their encoding, and operations and rounding. Further readings about FP arithmetic include.

[1]  Paul E. Ceruzzi,et al.  The Early Computers of Konrad Zuse, 1935 to 1945 , 1981, Annals of the History of Computing.

[2]  Raúl Rojas,et al.  The reconstruction of Konrad Zuse's Z3 , 2005, IEEE Annals of the History of Computing.

[3]  Peter Kornerup,et al.  Finite precision lexicographic continued fraction number systems , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[4]  William J. Cody,et al.  A statistical study of the accuracy of floating point number systems , 1983, CACM.

[5]  N. Kingsbury,et al.  Digital filtering using logarithmic arithmetic , 1971 .

[6]  Asim J. Al-Khalili,et al.  On low power floating point data path architectures , 2000 .

[7]  Jean-Michel Muller A Few Results on Table-Based Methods , 1998, SCAN.

[8]  F. W. J. Olver,et al.  Beyond Floating Point , 1984, JACM.

[9]  Christopher S. Wallace,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[10]  Eleanor Robson,et al.  Square Root Approximations in Old Babylonian Mathematics: YBC 7289 in Context , 1998 .

[11]  Jean Vuillemin,et al.  On Circuits and Numbers , 1994, IEEE Trans. Computers.

[12]  Andrew D. Booth,et al.  A SIGNED BINARY MULTIPLICATION TECHNIQUE , 1951 .

[13]  Peter Kornerup,et al.  Finite Precision Rational Arithmetic: An Arithmetic Unit , 1983, IEEE Transactions on Computers.

[14]  J. Oberg,et al.  Why the Mars probe [accident investigation] , 1999 .

[15]  Michael J. Schulte,et al.  Decimal Floating-Point Adder and Multifunction Unit with Injection-Based Rounding , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[16]  G. De Micheli,et al.  Circuit and architecture trade-offs for high-speed multiplication , 1991 .

[17]  Peter R. Turner,et al.  Implementation of level-index arithmetic using partial table look-up , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[18]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[19]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[20]  Arnaud Tisserand,et al.  Semi-Logarithmic Number Systems , 1998, IEEE Trans. Computers.

[21]  R. P. Brent,et al.  On the Precision Attainable with Various Floating-Point Number Systems , 1972, IEEE Transactions on Computers.

[22]  Eric M. Schwarz,et al.  A decimal floating-point specification , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[23]  Victor Luchangco,et al.  Object-oriented units of measurement , 2004, OOPSLA.

[24]  John Harrison,et al.  A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[25]  J. Muller Arithmétique des ordinateurs , 1989 .

[26]  David H. Bailey Some Background on Kanada's Recent Pi Calculation , 2003 .

[27]  Álvaro Vázquez Álvarez High-performance decimal floating point units , 2009 .

[28]  Paolo Montuschi,et al.  A New Family of High.Performance Parallel Decimal Multipliers , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[29]  William Aspray,et al.  Computing before computers , 1990 .

[30]  D. Al-Khalili,et al.  Comparison of 32-bit multipliers for various performance measures , 2000, ICM 2000. Proceedings of the 12th International Conference on Microelectronics. (IEEE Cat. No.00EX453).

[31]  William J. Cody,et al.  Static and Dynamic Numerical Characteristics of Floating-Point Arithmetic , 1973, IEEE Transactions on Computers.

[32]  Alan Edelman,et al.  The Mathematics of the Pentium Division Bug , 1997, SIAM Rev..

[33]  Jacques Laskar,et al.  A long-term numerical solution for the insolation quantities of the Earth , 2004 .

[34]  Michael J. Schulte,et al.  Decimal Floating-Point Multiplication Via Carry-Save Addition , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[35]  F.Y. Busaba,et al.  The IBM z900 decimal arithmetic unit , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[36]  Michael J. Schulte,et al.  Decimal floating-point division using Newton-Raphson iteration , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[37]  Michael F. Cowlishaw,et al.  Decimal floating-point: algorism for computers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[38]  Jean Vuillemin,et al.  Exact real computer arithmetic with continued fractions , 1988, IEEE Trans. Computers.

[39]  Earl E. Swartzlander,et al.  The Sign/Logarithm Number System , 1975, IEEE Transactions on Computers.

[40]  J.-M. Muller Algorithmes de division pour microprocesseurs : illustration à l'aide du Bug du Pentium , 1995 .

[41]  Brian Randell From Analytical Engine to Electronic Digital Computer: The Contributions of Ludgate, Torres, and Bush , 1982, Annals of the History of Computing.

[42]  William J. Cody,et al.  A statistical study of the accuracy of floating point number systems , 1973, CACM.

[43]  Peter Kornerup,et al.  Finite Precision Rational Arithmetic: Slash Number Systems , 1983, IEEE Transactions on Computers.