Stall Cycle Redistribution in a Transparent Fetch Pipeline

Power and power density are now primary design constraints for modern high performance microprocessors. Up to 70% of the dynamic power consumed can be attributed to the clocking system. A consequence of this trend is that clock gating has emerged as both a necessary and efficient method to significantly reduce dynamic power. Transparent pipelining, a recently proposed fine-grain clock gating technique, has the potential to significantly reduce clock power above and beyond conventional pipestage-level clock gating. Previous studies of transparent pipelining have focused on the circuit and implementation-related issues of this approach, while neglecting the broader microarchitectural implications. This paper aims to quantify the microarchitectural opportunities that are afforded by the use of transparent pipelining in a processor's fetch pipeline. We develop a technique, based on stall cycle redistribution, designed to improve the performance of transparent pipelining on fetch and other high utilization pipelines. We show that stall cycle redistribution can dramatically reduce the clocking overhead of an aggressively pipelined cell-like microprocessor

[1]  Sanjay J. Patel,et al.  Instruction fetch deferral using static slack , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[2]  Pradip Bose,et al.  Stretching the limits of clock-gating efficiency in server-class processors , 2005, 11th International Symposium on High-Performance Computer Architecture.

[3]  Eric Rotenberg,et al.  A case for dynamic pipeline scaling , 2002, CASES '02.

[4]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[5]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[6]  Mikko H. Lipasti,et al.  Precise and Accurate Processor Simulation , 2002 .

[7]  Pong-Fei Lu,et al.  Physical design of a fourth-generation POWER GHz microprocessor , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[8]  Shai Rubin,et al.  Focusing processor policies via critical-path prediction , 2001, ISCA 2001.

[9]  Aristides Efthymiou,et al.  Adaptive pipeline depth control for processor power-management , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[10]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[11]  Leonard Kleinrock,et al.  Queueing Systems: Volume I-Theory , 1975 .

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[14]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[15]  Andreas Moshovos,et al.  Instruction flow-based front-end throttling for power-aware high-performance processors , 2001, ISLPED '01.

[16]  Mahmut T. Kandemir,et al.  The design and use of simplePower: a cycle-accurate energy estimation tool , 2000, Proceedings 37th Design Automation Conference.

[17]  Hajime Shimada,et al.  Pipeline stage unification: a low-energy consumption technique for future mobile processors , 2003, ISLPED '03.

[18]  James E. Smith,et al.  Saving energy with just in time instruction delivery , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[19]  Hans M. Jacobson Improved clock-gating through transparent pipelining , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).