Wattch: a framework for architectural-level power analysis and optimizations

Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete. In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities. This paper presents Wattch, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level. Wattch is 1000X or more faster than existing layout-level power tools, and yet maintains accuracy within 10% of their estimates as verified using industry tools on leading-edge designs. This paper presents several validations of Wattch's accuracy. In addition, we present three examples that demonstrate how architects or compiler writers might use Wattch to evaluate power consumption in their design process. We see Wattch as a complement to existing lower-level tools; it allows architects to explore and cull the design space early on, using faster, higher-level tools. It also opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.

[1]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[2]  Alvin M. Despain,et al.  Cache designs for energy efficiency , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[3]  Circuits , 1995, Annals of the New York Academy of Sciences.

[4]  William J. Bowhill,et al.  Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU , 1995, Digit. Tech. J..

[5]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[6]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[7]  Mary Jane Irwin,et al.  Transistor sizing for low power CMOS circuits , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[9]  Wolfgang Fichtner,et al.  Low-power logic styles: CMOS versus pass-transistor logic , 1997, IEEE J. Solid State Circuits.

[10]  Mike Alexander,et al.  Thermal management system for high performance PowerPC/sup TM/ microprocessors , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[11]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  Paul D. Franzon,et al.  Low power data processing by elimination of redundant computations , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[13]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[14]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[15]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[16]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[17]  Gurindar S. Sohi,et al.  Instruction issue logic for high-performance, interruptable pipelined processors , 1987, ISCA '98.

[18]  Avi Mendelson,et al.  Using value prediction to increase the power of speculative execution hardware , 1998, TOCS.

[19]  Vivek Tiwari,et al.  Reducing power in high-performance microprocessors , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[20]  Dirk Grunwald,et al.  Pipeline gating: speculation control for energy reduction , 1998, ISCA.

[21]  Larry Rudolph,et al.  Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.

[22]  Victor V. Zyuban,et al.  The energy complexity of register files , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[23]  Srilatha Manne,et al.  Power and performance tradeoffs using various caching strategies , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[24]  H. Fair,et al.  Clocking design and analysis for a 600 MHz Alpha microprocessor , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[25]  Benjamin Bishop,et al.  The design of a register renaming unit , 1999, Proceedings Ninth Great Lakes Symposium on VLSI.

[26]  Margaret Martonosi,et al.  Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques , 1999, IEEE Trans. Computers.

[27]  Quinn Jacobson,et al.  Instruction pre-processing in trace processors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[28]  Margaret Martonosi,et al.  Dynamically exploiting narrow width operands to improve processor power and performance , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[29]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .