Chapter One - An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.

[1]  Andrew Wolfe,et al.  Power Analysis Of Embedded Software: A First Step Towards Software Power Minimization , 1994, IEEE/ACM International Conference on Computer-Aided Design.

[2]  Wann-Yun Shieh,et al.  Saving register-file static power by monitoring short-lived temporary-values in ROB , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[3]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS 2010.

[4]  Alireza Noruzi Google Scholar: The New Generation of Citation Indexes , 2005 .

[5]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[6]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[7]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[8]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[9]  Ragunathan Rajkumar,et al.  Critical power slope: understanding the runtime effects of frequency scaling , 2002, ICS '02.

[10]  Kanad Ghose,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[11]  Margaret Martonosi,et al.  Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.

[12]  Yiran Chen,et al.  Deterministic clock gating for microprocessor power reduction , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[13]  Chu Shik Jhon,et al.  Dynamic register-renaming scheme for reducing power-density and temperature , 2010, SAC '10.

[14]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[15]  Kanad Ghose,et al.  Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[16]  Narayanan Vijaykrishnan,et al.  Impact of scaling on the effectiveness of dynamic power reduction schemes , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[17]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[18]  Jeroen Bosman,et al.  Scopus reviewed and compared: the coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar , 2006 .

[19]  Ying Fai Tong,et al.  Minimizing Floating-Point Power Dissipation Via Bitwidth Reduction , 2006 .

[20]  Trevor Mudge,et al.  Automatic Performance Setting for Dynamic Voltage Scaling , 2002 .

[21]  Niraj K. Jha,et al.  Joint dynamic voltage scaling and adaptive body biasing for heterogeneous distributed real-time embedded systems , 2003, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[23]  M. Valero,et al.  Fuzzy memoization for floating-point multimedia applications , 2005, IEEE Transactions on Computers.

[24]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[25]  Diana Marculescu,et al.  Power and performance evaluation of globally asynchronous locally synchronous processors , 2002, ISCA.

[26]  Chia-Lin Yang,et al.  HotSpot cache: joint temporal and spatial locality exploitation for I-cache energy reduction , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[27]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[28]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[29]  Antonio González,et al.  Trace-level reuse , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[30]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[31]  Larry Rudolph,et al.  Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.

[32]  Ibrahim N. Hajj,et al.  Using dynamic cache management techniques to reduce energy in a high-performance processor , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[33]  Mateo Valero,et al.  On the selection of adder unit in energy efficient vector processing , 2013, International Symposium on Quality Electronic Design (ISQED).

[34]  Norbert Wehn,et al.  Energy simulation of embedded XScale systems with XEEMU , 2009, J. Embed. Comput..

[35]  David M. Brooks,et al.  A circuit level implementation of an adaptive issue queue for power-aware microprocessors , 2001, GLSVLSI '01.

[36]  Trevor Mudge,et al.  Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads , 2002, ICCAD 2002.

[37]  Kevin Skadron,et al.  HotSpot: a dynamic compact thermal model at the processor-architecture level , 2003, Microelectron. J..

[38]  Carlos Alvarez Martinez,et al.  Dynamic Tolerance Region Computing for Multimedia , 2012, IEEE Transactions on Computers.

[39]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[40]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[41]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[42]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[43]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[44]  Mahmut T. Kandemir,et al.  Energy-conscious compilation based on voltage scaling , 2002, LCTES/SCOPES '02.

[45]  José González,et al.  Power-aware control speculation through selective throttling , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[46]  Avi Mendelson,et al.  Micro-operation cache: a power aware frontend for variable instruction length ISA , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[47]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[48]  Ibrahim N. Hajj,et al.  Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[49]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[50]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[51]  Sharad Malik,et al.  Intraprogram dynamic voltage scaling: Bounding opportunities with analytic modeling , 2004, TACO.

[52]  Sandhya Dwarkadas,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[53]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[54]  James E. Smith,et al.  Very low power pipelines using significance compression , 2000, MICRO 33.

[55]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[56]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[57]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[58]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[59]  Emil Talpes,et al.  Toward a multiple clock/voltage island design style for power-aware processors , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.