P6 Binary Floating-Point Unit

The floating point unit of the next generation PowerPC is detailed. It has been tested at over 5 GHz. The design supports an extremely aggressive cycle time of 13 FO4 using a technology independent measure. For most dependent instructions, its fused multiply-add dataflow has only 6 effective pipeline stages. This is nearly equivalent to its predecessor, the Power 5, even though its technology independent frequency has increased over 70%. Overall the frequency has improved over 100%. It achieves this high performance through aggressive feedback paths, circuit design and layout. The pipeline has 7 stages but data may be fed back to dependent operations prior to rounding and complete normalization. Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.

[1]  18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 25-27 June 2007, Montpellier, France , 2007, IEEE Symposium on Computer Arithmetic.

[2]  Ramesh C. Agarwal,et al.  Approximation Methods for Divide and Square Root in the Power 3 Processor , .

[3]  Charles Roth,et al.  A low-power, high-speed implementation of a PowerPC/sup TM/ microprocessor vector extension , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[4]  Leonid Sigal,et al.  4GHz+ low-latency fixed-point and binary floating-point execution units for the POWER6 processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[5]  Eric M. Schwarz,et al.  Binary Floating-Point Unit Design , 2006 .

[6]  Ramesh C. Agarwal,et al.  Series approximation methods for divide and square root in the Power3/sup TM/ processor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[7]  Kevin J. Nowka,et al.  Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[8]  Xiao Yan Yu,et al.  A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor , 2006, 2006 Proceedings of the 32nd European Solid-State Circuits Conference.

[9]  Erdem Hokenek,et al.  Leading-Zero Anticipator (LZA) in the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[10]  Eric M. Schwarz,et al.  High performance floating-point unit with 116 bit wide divider , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[11]  Eric M. Schwarz,et al.  Hardware implementations of denormalized numbers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[12]  Peter W. Markstein,et al.  IA-64 and elementary functions - speed and precision , 2000 .

[13]  Eric M. Schwarz,et al.  FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.

[14]  Erdem Hokenek,et al.  Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[15]  Stuart F. Oberman,et al.  Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[16]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[17]  Sang H. Dhong,et al.  The vector floating-point unit in a synergistic processor element of a CELL processor , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).