Approximation Methods for Divide and Square Root in the Power 3 Processor

The Power3 processor is a 64-bit implementation of the PowerPC™ architecture and is the successor to the Power2™ processor for workstations and servers which require high performance floating point capability. The previous processors used Newton-Raphson algorithms for their implementations of divide and square root. The Power3 processor has a longer pipeline latency, which would substantially increase the latency for these instructions. Instead, new algorithms based on power series approximations were developed which provide significantly better performance than the Newton-Raphson algorithm for this processor. This paper describes the algorithms, and then shows how both the series based algorithms and the Newton-Raphson algorithms are affected by pipeline length. For the Power3, the power series algorithms reduce the divide latency by over 20% and the square root latency by 35%.

[1]  Richard R. Oehler,et al.  IBM RISC System/6000 Processor Architecture , 1990, IBM J. Res. Dev..

[2]  Marius A. Cornea-Hasegan,et al.  Proving the IEEE Correctness of Iterative Floating-Point Square Root , Divide , and Remainder Algorithms , 1998 .

[3]  S. F. Anderson,et al.  The IBM system/360 model 91: floating-point execution unit , 1967 .

[4]  Guido D. Salvucci,et al.  Ieee standard for binary floating-point arithmetic , 1985 .

[5]  Miriam Leeser,et al.  An area/performance comparison of subtractive and multiplicative divide/square root implementations , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[6]  Erdem Hokenek,et al.  Leading-Zero Anticipator (LZA) in the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[7]  Mark Horowitz,et al.  SRT division architectures and implementations , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[8]  Ramesh C. Agarwal,et al.  New Scalar and Vector Elementary Functions for the IBM System/370 , 1986, IBM J. Res. Dev..

[9]  Michael J. Flynn On Division by Functional Iteration , 1970, IEEE Transactions on Computers.

[10]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[11]  James E. Robertson,et al.  A New Class of Digital Division Methods , 1958, IRE Trans. Electron. Comput..

[12]  Peter W. Markstein Computation of Elementary Functions on the IBM RISC System/6000 Processors , 1990, IBM J. Res. Dev..

[13]  Shuzo Yajima,et al.  Efficient initial approximation and fast converging methods for division and square root , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.