Computer Architecture Techniques for Power-Efficiency

In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these costs is the inexorable increase in power dissipation and power density in processors. Power dissipation issues have catalyzed new topic areas in computer architecture, resulting in a substantial body of work on more power-efficient architectures. Power dissipation coupled with diminishing performance gains, was also the main cause for the switch from single-core to multi-core architectures and slowdown in frequency increase. This book aims to document some of the most important architectural techniques that were invented, proposed, and applied to reduce both dynamic power and static power dissipation in processors and memory hierarchies. A significant number of techniques have been proposed for a wide range of situations and this book synthesizes those techniques by focusing on their common characteristics. Table of Contents: Introduction / Modeling, Simulation, and Measurement / Using Voltage and Frequency Adjustments to Manage Dynamic Power / Optimizing Capacitance and Switching Activity to Reduce Dynamic Power / Managing Static (Leakage) Power / Conclusions

[1]  P. Boyle,et al.  A 300-MHz 115-W 32-b bipolar ECL microprocessor , 1993 .

[2]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[3]  E. Cohen,et al.  Hotspot-Limited Microprocessors: Direct Temperature and Power Distribution Measurements , 2007, IEEE Journal of Solid-State Circuits.

[4]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[5]  Stefanos Kaxiras,et al.  Applying Decay to Reduce Dynamic Power in Set-Associative Caches , 2007, HiPEAC.

[6]  Emil Talpes,et al.  Toward a multiple clock/voltage island design style for power-aware processors , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  K. Mistry,et al.  The High-k Solution , 2007, IEEE Spectrum.

[8]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[9]  Mahmut T. Kandemir,et al.  Energy-conscious compilation based on voltage scaling , 2002, LCTES/SCOPES '02.

[10]  Michael C. Huang,et al.  L1 data cache decomposition for energy efficiency , 2001, ISLPED '01.

[11]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[12]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Richard B. Brown,et al.  Efficient techniques for gate leakage estimation , 2003, ISLPED '03.

[15]  Murali Annavaram,et al.  Mitigating Amdahl's Law through EPI Throttling , 2005, ISCA 2005.

[16]  Lawrence T. Clark,et al.  An embedded 32-b microprocessor core for low-power and high-performance applications , 2001 .

[17]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[18]  Eby G. Friedman,et al.  Managing static leakage energy in microprocessor functional units , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[19]  Mark D. Hill,et al.  A case for direct-mapped caches , 1988, Computer.

[20]  Michael Gschwind,et al.  Integrated analysis of power and performance for pipelined microprocessors , 2004, IEEE Transactions on Computers.

[21]  Kanad Ghose,et al.  Register Packing: Exploiting Narrow-Width Operands for Reducing Register File Pressure , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[22]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[23]  Naresh R. Shanbhag,et al.  A coding framework for low-power address and data busses , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[24]  Babak Falsafi,et al.  JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[25]  David H. Albonesi Dynamic IPC/clock rate optimization , 1998, ISCA.

[26]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[27]  Stefanos Kaxiras,et al.  A simple mechanism to adapt leakage-control policies to temperature , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[28]  R.E. Simons,et al.  Experimental investigation of an enhanced thermosyphon heat loop for cooling of a high performance electronics module , 1999, Fifteenth Annual IEEE Semiconductor Thermal Measurement and Management Symposium (Cat. No.99CH36306).

[29]  Wen-Ben Jone,et al.  Location cache: a low-power L2 cache system , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[30]  Norman P. Jouppi,et al.  Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures , 2003, IEEE Computer Architecture Letters.

[31]  Mohamed I. Elmasry,et al.  Dynamic and leakage power reduction in MTCMOS circuits using an automated efficient gate clustering technique , 2002, DAC '02.

[32]  Shin'ichiro Mutoh,et al.  1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS , 1995, IEEE J. Solid State Circuits.

[33]  Jun Yang,et al.  Frequent value locality and its applications , 2002, TECS.

[34]  Ikuya Kawasaki,et al.  SH3: high code density, low power , 1995, IEEE Micro.

[35]  Stefanos Kaxiras,et al.  4T-decay sensors: a new class of small, fast, robust, and low-power, temperature/leakage sensors , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[36]  Aneesh Aggarwal,et al.  Restrictive compression techniques to increase level 1 cache capacity , 2005, 2005 International Conference on Computer Design.

[37]  Gürhan Küçük,et al.  Energy: efficient instruction dispatch buffer design for superscalar processors , 2001, ISLPED '01.

[38]  A.P. Chandrakasan,et al.  Dual-threshold voltage techniques for low-power digital circuits , 2000, IEEE Journal of Solid-State Circuits.

[39]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[40]  Krste Asanovic,et al.  Fine-grain CAM-tag cache resizing using miss tags , 2002, ISLPED '02.

[41]  Laxmi N. Bhuyan,et al.  Power efficient encoding techniques for off-chip data buses , 2003, CASES '03.

[42]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[43]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[44]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[45]  Larry Rudolph,et al.  Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.

[46]  Margaret Martonosi,et al.  Identifying program power phase behavior using power vectors , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[47]  Antonio González,et al.  Trace-level reuse , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[48]  Arvin Park,et al.  Dynamic base register caching: a technique for reducing address bus width , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[49]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[50]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[51]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[52]  Ibrahim N. Hajj,et al.  Using dynamic cache management techniques to reduce energy in a high-performance processor , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[53]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[54]  J.J. Navarro,et al.  The Difference-Bit Cache , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[55]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[56]  Mahmut T. Kandemir,et al.  Leakage energy management in cache hierarchies , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[57]  Wei Zhang,et al.  Static next sub-bank prediction for drowsy instruction cache , 2004, CASES '04.

[58]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[59]  Avi Mendelson,et al.  Can program profiling support value prediction? , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[60]  Srilatha Manne,et al.  Power and performance tradeoffs using various caching strategies , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[61]  T. N. Vijaykumar,et al.  Reactive-associative caches , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[62]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[63]  Yiannakis Sazeides,et al.  An analytical model of temperature in microprocessors , 2005 .

[64]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[65]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[66]  Trevor Mudge,et al.  Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads , 2002, ICCAD 2002.

[67]  Margaret Martonosi,et al.  Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance , 2006, IEEE Micro.

[68]  Luca Benini,et al.  Synthesis of low-overhead interfaces for power-efficient communication over wide buses , 1999, DAC '99.

[69]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[70]  Stefanos Kaxiras,et al.  Dynamic Dictionary-Based Data Compression for Level-1 Caches , 2006, ARCS.

[71]  Yan Meng,et al.  On the limits of leakage power reduction in caches , 2005, 11th International Symposium on High-Performance Computer Architecture.

[72]  Mateo Valero,et al.  A first glance at Kilo-instruction based multiprocessors , 2004, CF '04.

[73]  Michael S. Hsiao,et al.  Region-level approximate computation reuse for power reduction in multimedia applications , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[74]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[75]  Chris Wilkerson,et al.  Hierarchical scheduling windows , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[76]  Steven Hsu,et al.  Effectiveness and scaling trends of leakage control techniques for sub-130nm CMOS technologies , 2003, ISLPED '03.

[77]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[78]  Joshua Green,et al.  Improving Program Efficiency by Packing Instructions into Registers , 2005, ISCA 2005.

[79]  Mircea R. Stan,et al.  Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[80]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[81]  R. E. Kessler,et al.  Inexpensive implementations of set-associativity , 1989, ISCA '89.

[82]  Alvin M. Despain,et al.  Cache designs for energy efficiency , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[83]  Rihana S. Williams,et al.  Computing in the 21st century: nanocircuitry, defect tolerance and quantum logic , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[84]  Eric Rotenberg,et al.  A large, fast instruction window for tolerating cache misses , 2002, ISCA.

[85]  Michael L. Scott,et al.  Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[86]  Pradip Bose,et al.  Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[87]  Mahmut Kandemir,et al.  Power protocol: reducing power dissipation on off-chip data buses , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[88]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[89]  Jun Yang,et al.  FV encoding for low-power data I/O , 2001, ISLPED '01.

[90]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[91]  Norman P. Jouppi,et al.  Designing, packaging, and testing a 300-MHz, 115 W ECL microprocessor , 1994, IEEE Micro.

[92]  Mark C. Johnson,et al.  Design and optimization of low voltage high performance dual threshold CMOS circuits , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[93]  Manish Gupta,et al.  Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors , 2000, IEEE Micro.

[94]  Wei Zhang,et al.  Compiler-directed instruction cache leakage optimization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[95]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[96]  Srilatha Manne,et al.  Power and energy reduction via pipeline balancing , 2001, ISCA 2001.

[97]  Yan Solihin,et al.  Counter-based cache replacement algorithms , 2005, 2005 International Conference on Computer Design.

[98]  Graham C. Driscoll,et al.  A processor allocation method for time-sharing , 1970, CACM.

[99]  Ulrich Kremer,et al.  The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction , 2003, PLDI '03.

[100]  Jun Yang,et al.  A tunable bus encoder for off-chip data buses , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[101]  Ramon Canal,et al.  Reducing the complexity of the issue logic , 2001, ICS '01.

[102]  D. Blaauw,et al.  Single-V/sub DD/ and single-V/sub T/ super-drowsy techniques for low-leakage high-performance instruction caches , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[103]  T. Sherratt,et al.  Hiding in plain sight. , 2004, Trends in ecology & evolution.

[104]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[105]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[106]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[107]  Vladimir M. Pentkovski,et al.  Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.

[108]  Trevor Mudge,et al.  Automatic Performance Setting for Dynamic Voltage Scaling , 2002 .

[109]  David R. Kaeli,et al.  Exploiting temporal locality in drowsy cache policies , 2005, CF '05.

[110]  Simha Sethumadhavan,et al.  Scalable Hardware Memory Disambiguation for High-ILP Processors , 2004, IEEE Micro.

[111]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[112]  M.-N. Sabry High-precision compact-thermal models , 2005, IEEE Transactions on Components and Packaging Technologies.

[113]  Krste Asanovic,et al.  Dynamic fine-grain leakage reduction using leakage-biased bitlines , 2002, ISCA.

[114]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[115]  Philip Levis,et al.  Policies for dynamic clock scheduling , 2000, OSDI.

[116]  Kimming So,et al.  Cache design of a sub-micron CMOS system/370 , 1987, ISCA '87.

[117]  Chris Hinds,et al.  of the The Superscalar Architecture MC 68060 , 2004 .

[118]  Sivanand Simanapalli,et al.  DSP16000: a high performance, low-power dual-MAC DSP core for communications applications , 1998, Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No.98CH36143).

[119]  Jun Yang,et al.  Frequent value compression in data caches , 2000, MICRO 33.

[120]  Michael F. P. O'Boyle,et al.  IATAC: a smart predictor to turn-off L2 cache lines , 2005, TACO.

[121]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[122]  Pierre Michaud,et al.  Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[123]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[124]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[125]  Keshab K. Parhi,et al.  Low power SRAM design using hierarchical divided bit-line approach , 1998, Proceedings International Conference on Computer Design. VLSI in Computers and Processors (Cat. No.98CB36273).

[126]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[127]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.

[128]  Jose Renau,et al.  Power model validation through thermal measurements , 2007, ISCA '07.

[129]  Andreas Moshovos,et al.  Low-leakage asymmetric-cell SRAM , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[130]  A. R. Newton,et al.  Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas , 1990 .

[131]  Avi Mendelson,et al.  Micro-operation cache: a power aware frontend for the variable instruction length ISA , 2001, ISLPED '01.

[132]  Margaret Martonosi,et al.  Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[133]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[134]  Trevor Mudge,et al.  Drowsy instruction caches. Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[135]  Sharad Malik,et al.  Intraprogram dynamic voltage scaling: Bounding opportunities with analytic modeling , 2004, TACO.

[136]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.

[137]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[138]  David M. Brooks,et al.  A circuit level implementation of an adaptive issue queue for power-aware microprocessors , 2001, GLSVLSI '01.

[139]  Larry Rudolph,et al.  A Dynamically Partitionable Compressed Cache , 2003 .

[140]  Wen-mei W. Hwu,et al.  Hardware support for dynamic activation of compiler-directed computation reuse , 2000, SIGP.

[141]  Niraj K. Jha,et al.  Joint dynamic voltage scaling and adaptive body biasing for heterogeneous distributed real-time embedded systems , 2003, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[142]  Yiran Chen,et al.  Deterministic clock gating for microprocessor power reduction , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[143]  W. Robert Daasch,et al.  TEM2P2EST: A Thermal Enabled Multi-model Power/Performance ESTimator , 2000, PACS.

[144]  Margaret Martonosi,et al.  Managing leakage for transient data: decay and quasi-static 4T memory cells , 2002, ISLPED '02.

[145]  Steven K. Reinhardt,et al.  A unified compressed memory hierarchy , 2005, 11th International Symposium on High-Performance Computer Architecture.

[146]  M.J. Flynn,et al.  Deep submicron microprocessor design issues , 1999, IEEE Micro.

[147]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[148]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[149]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[150]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[151]  Dean M. Tullsen,et al.  The Danger of Interval-Based Power Efficiency Metrics: When Worst Is Best , 2005, IEEE Computer Architecture Letters.

[152]  Mary Jane Irwin,et al.  Some issues in gray code addressing , 1996, Proceedings of the Sixth Great Lakes Symposium on VLSI.

[153]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[154]  Mahmut T. Kandemir,et al.  Exploiting program hotspots and code sequentiality for instruction cache leakage management , 2003, ISLPED '03.

[155]  Victor V. Zyuban,et al.  Optimization of high-performance superscalar architectures for energy efficiency , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[156]  Kaushik Roy,et al.  High-performance low-power CMOS circuits using multiple channel length and multiple oxide thickness , 2000, Proceedings 2000 International Conference on Computer Design.

[157]  Chia-Lin Yang,et al.  HotSpot cache: joint temporal and spatial locality exploitation for I-cache energy reduction , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[158]  Gilles Pokam,et al.  A case for a complexity-effective, width-partitioned microarchitecture , 2006, TACO.

[159]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[160]  Kevin Skadron,et al.  State-preserving vs. non-state-preserving leakage control in caches , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[161]  Tomás Lang,et al.  Working-zone encoding for reducing the energy in microprocessor address buses , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[162]  A. Argawal,et al.  Cache performance of operating systems and multiprogramming , 1988 .

[163]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[164]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[165]  Michael Gschwind,et al.  New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors , 2003, IBM J. Res. Dev..

[166]  Dirk Grunwald,et al.  Confidence estimation for speculation control , 1998, ISCA.

[167]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[168]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[169]  Haitham Akkary,et al.  Checkpoint processing and recovery: towards scalable large instruction window processors , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[170]  Wen-mei W. Hwu,et al.  Compiler-directed dynamic computation reuse: rationale and initial results , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[171]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[172]  Chenxi Zhang,et al.  Two fast and high-associativity cache schemes , 1997, IEEE Micro.

[173]  Mahmut T. Kandemir,et al.  Soft errors issues in low-power caches , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[174]  Kanad Ghose,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[175]  Eric Rotenberg,et al.  Adaptive mode control: a static-power-efficient cache design , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[176]  Narayanan Vijaykrishnan,et al.  Impact of scaling on the effectiveness of dynamic power reduction schemes , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[177]  Xiaodong Zhang,et al.  Access-Mode Predictions for Low-Power Cache Design , 2002, IEEE Micro.

[178]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[179]  Kaushik Roy,et al.  Larger-than-vdd forward body bias in sub-0.5V nanoscale CMOS , 2004, ISLPED '04.

[180]  José González,et al.  Power-aware control speculation through selective throttling , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[181]  Avi Mendelson,et al.  Micro-operation cache: a power aware frontend for variable instruction length ISA , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[182]  Andreas Moshovos,et al.  Low-leakage asymmetric-cell SRAM , 2002, ISLPED '02.

[183]  Pradip Bose,et al.  Microarchitectural techniques for power gating of execution units , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[184]  Ibrahim N. Hajj,et al.  Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[185]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[186]  Gary S. Tyson,et al.  Eager writeback-a technique for improving bandwidth utilization , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[187]  Sharad Malik,et al.  Compile-time dynamic voltage scaling settings: opportunities and limits , 2003, PLDI '03.

[188]  Luca Benini,et al.  Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems , 1997, Proceedings Great Lakes Symposium on VLSI.

[189]  R. Stanley Williams,et al.  Physics and the Information Revolution , 2000 .

[190]  Kimming So,et al.  Cache Operations by MRU Change , 1988, IEEE Trans. Computers.

[191]  Krste Asanovic,et al.  Reducing power density through activity migration , 2003, ISLPED '03.

[192]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[193]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[194]  Lishing Liu Cache designs with partial address matching , 1994, MICRO 27.

[195]  Jun Yang,et al.  Frequent Value Locality and Value-Centric Data Cache Design , 2000, ASPLOS.

[196]  Balaram Sinharoy,et al.  Design and implementation of the POWER5 microprocessor , 2004, Proceedings. 41st Design Automation Conference, 2004..

[197]  James Kolodzey,et al.  CRAY-1 Computer Technology , 1981 .

[198]  Dean M. Tullsen,et al.  Reducing power with dynamic critical path information , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[199]  Marios C. Papaefthymiou,et al.  Precomputation-based sequential logic optimization for low power , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[200]  David E. Taylor,et al.  Longest prefix matching using bloom filters , 2006, TNET.

[201]  Uri C. Weiser,et al.  Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors , 2006, IEEE Computer Architecture Letters.

[202]  Sandhya Dwarkadas,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[203]  David Blaauw,et al.  Circuit and microarchitectural techniques for reducing cache leakage power , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[204]  Michael C. Huang,et al.  Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.

[205]  Luca Benini,et al.  Address bus encoding techniques for system-level power optimization , 1998, Proceedings Design, Automation and Test in Europe.

[206]  Larry Rudolph,et al.  Creating a wider bus using caching techniques , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[207]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[208]  James E. Smith,et al.  Very low power pipelines using significance compression , 2000, MICRO 33.

[209]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[210]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[211]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[212]  Michael Zhang,et al.  Highly-Associative Caches for Low-Power Processors , 2000 .

[213]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[214]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[215]  David A. Wood,et al.  Adaptive cache compression for high-performance processors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[216]  Se-Hyun Yang,et al.  Near-optimal precharging in high-performance nanoscale CMOS caches , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[217]  Thomas D. Burd,et al.  Energy efficient CMOS microprocessor design , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[218]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[219]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[220]  R. Serber,et al.  Scaling Law for High-Energy Elastic Scattering , 1964 .

[221]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[222]  Kevin Skadron,et al.  Adaptive Cache Decay using Formal Feedback Control , 2002 .

[223]  David J. Bishop,et al.  The Little Machines That are Making it Big , 2001 .

[224]  Mike Alexander,et al.  Thermal management system for high performance PowerPC/sup TM/ microprocessors , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[225]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[226]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[227]  Steven K. Reinhardt,et al.  A scalable instruction queue design using dependence chains , 2002, ISCA.