论文信息 - P6 Binary Floating-Point Unit

P6 Binary Floating-Point Unit

The floating point unit of the next generation PowerPC is detailed. It has been tested at over 5 GHz. The design supports an extremely aggressive cycle time of 13 FO4 using a technology independent measure. For most dependent instructions, its fused multiply-add dataflow has only 6 effective pipeline stages. This is nearly equivalent to its predecessor, the Power 5, even though its technology independent frequency has increased over 70%. Overall the frequency has improved over 100%. It achieves this high performance through aggressive feedback paths, circuit design and layout. The pipeline has 7 stages but data may be fed back to dependent operations prior to rounding and complete normalization. Division and square root algorithms are also described which take advantage of high-precision linear approximation hardware for obtaining a reciprocal or reciprocal square root approximation.

Eric M. Schwarz | Martin S. Schmookler | Michael Kroener | Son Dao Trong

[1] 18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 25-27 June 2007, Montpellier, France , 2007, IEEE Symposium on Computer Arithmetic.

[2] Ramesh C. Agarwal,et al. Approximation Methods for Divide and Square Root in the Power 3 Processor , .

[3] Charles Roth,et al. A low-power, high-speed implementation of a PowerPC/sup TM/ microprocessor vector extension , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[4] Leonid Sigal,et al. 4GHz+ low-latency fixed-point and binary floating-point execution units for the POWER6 processor , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[5] Eric M. Schwarz,et al. Binary Floating-Point Unit Design , 2006 .

[6] Ramesh C. Agarwal,et al. Series approximation methods for divide and square root in the Power3/sup TM/ processor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[7] Kevin J. Nowka,et al. Leading zero anticipation and detection-a comparison of methods , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[8] Xiao Yan Yu,et al. A 5GHz+ 128-bit Binary Floating-Point Adder for the POWER6 Processor , 2006, 2006 Proceedings of the 32nd European Solid-State Circuits Conference.

[9] Erdem Hokenek,et al. Leading-Zero Anticipator (LZA) in the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[10] Eric M. Schwarz,et al. High performance floating-point unit with 116 bit wide divider , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[11] Eric M. Schwarz,et al. Hardware implementations of denormalized numbers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[12] Peter W. Markstein,et al. IA-64 and elementary functions - speed and precision , 2000 .

[13] Eric M. Schwarz,et al. FPU implementations with denormalized numbers , 2005, IEEE Transactions on Computers.

[14] Erdem Hokenek,et al. Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[15] Stuart F. Oberman,et al. Floating point division and square root algorithms and implementation in the AMD-K7/sup TM/ microprocessor , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).

[16] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .

[17] Sang H. Dhong,et al. The vector floating-point unit in a synergistic processor element of a CELL processor , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).