Power balanced pipelines

Since the onset of pipelined processors, balancing the delay of the microarchitectural pipeline stages such that each microarchitectural pipeline stage has an equal delay has been a primary design objective, as it maximizes instruction throughput. Unfortunately, this causes significant energy inefficiency in processors, as each microarchitectural pipeline stage gets the same amount of time to complete, irrespective of its size or complexity. For power-optimized processors, the inefficiency manifests itself as a significant imbalance in power consumption of different microarchitectural pipestages. In this paper, rather than balancing processor pipelines for delay, we propose the concept of power balanced pipelines - i.e., processor pipelines in which different delays are assigned to different microarchitectural pipestages to reduce the power disparity between the stages while guaranteeing the same processor frequency/performance. A specific implementation of the concept uses cycle time stealing [19] to deliberately redistribute cycle time from low-power pipeline stages to power-hungry stages, relaxing their timing constraints and allowing them to operate at reduced voltages or use smaller, less leaky cells. We present several static and dynamic techniques for power balancing and demonstrate that balancing pipeline power rather than delay can result in 46% processor power reduction with no loss in processor throughput for a full FabScalar processor over a power-optimized baseline. Benefits are comparable over a Fabscalar baseline where static cycle time stealing is used to optimize achieved frequency. Power savings increase at lower operating frequencies. To the best of our knowledge, this is the first such work on microarchitecture-level power reduction that guarantees the same performance.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  David Blaauw,et al.  Reducing pipeline energy demands with local DVS and dynamic retiming , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[3]  Seth Copen Goldstein,et al.  Operation chaining asynchronous pipelined circuits , 2007, ICCAD 2007.

[4]  Scott Hauck,et al.  Asynchronous design methodologies: an overview , 1995, Proc. IEEE.

[5]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[6]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[7]  Sanjay J. Patel,et al.  Rigel: an architecture and scalable programming interface for a 1000-core accelerator , 2009, ISCA '09.

[8]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[9]  Eric Rotenberg,et al.  FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[10]  Greg Hamerly,et al.  SimPoint 3.0: Faster and More Flexible Program Analysis , 2005 .

[11]  Lei He,et al.  Temperature and supply Voltage aware performance and power modeling at microarchitecture level , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12]  Gu-Yeon Wei,et al.  ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency , 2008, 2008 International Symposium on Computer Architecture.

[13]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[14]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[15]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[16]  Josep Torrellas,et al.  ReCycle:: pipeline adaptation to tolerate process variation , 2007, ISCA '07.