Integrated analysis of power and performance for pipelined microprocessors

Choosing the pipeline depth of a microprocessor is one of the most critical design decisions that an architect must make in the concept phase of a microprocessor design. To be successful in today's cost/performance marketplace, modern CPU designs must effectively balance both performance and power dissipation. The choice of pipeline depth and target clock frequency has a critical impact on both of these metrics. We describe an optimization methodology based on both analytical models and detailed simulations for power and performance as a function of pipeline depth. Our results for a set of SPEC2000 applications show that, when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of our energy models. Finally, we discuss the potential risks in design quality for overly aggressive or conservative choices of pipeline depth.

[1]  Michael Gschwind,et al.  New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors , 2003, IBM J. Res. Dev..

[2]  Victor V. Zyuban,et al.  Clocking strategies and scannable latches for low power appliacations , 2001, ISLPED '01.

[3]  H. H. Chen,et al.  CPAM: a common power analysis methodology for high-performance VLSI design , 2000, IEEE 9th Topical Meeting on Electrical Performance of Electronic Packaging (Cat. No.00TH8524).

[4]  G. De Micheli,et al.  Circuit and architecture trade-offs for high-speed multiplication , 1991 .

[5]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[6]  M.J. Flynn,et al.  Deep submicron microprocessor design issues , 1999, IEEE Micro.

[7]  James E. Smith,et al.  Optimal Pipelining in Supercomputers , 1986, ISCA.

[8]  Allan Hartstein,et al.  The optimum pipeline depth for a microprocessor , 2002, ISCA.

[9]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[10]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[12]  Michael Gschwind,et al.  Optimizing pipelines for power and performance , 2002, MICRO.

[13]  Louise Trevillyan,et al.  Representative traces for processor models with infinite cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[14]  Romesh M. Jessani,et al.  The floating-point unit of the PowerPC 603e microprocessor , 1996, IBM J. Res. Dev..

[15]  Pradip Bose,et al.  Validation of Turandot, a fast processor model for microarchitecture exploration , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[16]  Margaret Martonosi,et al.  Power-Performance Modeling and Tradeoff Analysis for a High End Microprocessor , 2000, PACS.

[17]  Victor V. Zyuban,et al.  Inherently Lower-Power High-Performance Superscalar Architectures , 2001, IEEE Trans. Computers.

[18]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[19]  Michael J. Flynn,et al.  Optimal Pipelining , 1990, J. Parallel Distributed Comput..

[20]  Victor V. Zyuban,et al.  Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels , 2002, ISLPED '02.

[21]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.

[22]  Victor V. Zyuban,et al.  Balancing hardware intensity in microprocessor pipelines , 2003, IBM J. Res. Dev..

[23]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[24]  Mayan Moudgill,et al.  Environment for PowerPC microarchitecture exploration , 1999, IEEE Micro.

[25]  Norman P. Jouppi,et al.  The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays , 2002, ISCA.

[26]  Philip G. Emma,et al.  Characterization of Branch and Data Dependencies in Programs for Evaluating Pipeline Performance , 1987, IEEE Transactions on Computers.

[27]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[28]  Victor V. Zyuban Optimization of scannable latches for low energy , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[29]  Vojin G. Oklobdzija Clocking and clocked storage elements in a multi-gigahertz environment , 2003, IBM J. Res. Dev..

[30]  V. Zyuban,et al.  Clocking strategies and scannable latches for low power applications , 2001, ISLPED'01: Proceedings of the 2001 International Symposium on Low Power Electronics and Design (IEEE Cat. No.01TH8581).

[31]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).