Enhancing Compiler Techniques for Memory Energy Optimizations

As both chip densities and clock frequencies steadily rise in modern microprocessors, energy consumption is quickly joining performance as a key design constraint. Power issues are increasingly important in embedded systems, especially those found in portable devices. Much research has focused on the memory subsystems of these devices since they are a leading energy consumer. Compiler optimizations that are traditionally used to increase performance have shown much promise in also reducing cache energy consumption. In this paper we study the interaction between performance-oriented compiler optimizations and memory energy consumption and demonstrate that the best performance optimizations do not necessarily generate the best energy behavior in memory. We also show a simple metric that a power-optimizing compiler can utilize in order to capture the energy impact of potential optimizations. Next, we present heuristic algorithms that determine a suitable optimization strategy given a memory energy upper bound. Finally, we demonstrate that our strategies will gain even more importance in the future when leakage energy is expected to play an even larger role in the total energy consumption equation.

[1]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[2]  Rainer Leupers,et al.  Code optimization techniques for embedded processors - methods, algorithms, and tools , 2000 .

[3]  Dr. Rainer Leupers Code Optimization Techniques for Embedded Processors , 2000, Springer US.

[4]  Mahmut T. Kandemir,et al.  Tuning garbage collection in an embedded Java environment , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[5]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[6]  Richard T. Witek,et al.  A 160 MHz 32 b 0.5 W CMOS RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[7]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[8]  Mahmut T. Kandemir,et al.  DRAM energy management using software and hardware directed power mode control , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[9]  Steve Furber ARM System-on-Chip Architecture , 2000 .

[10]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[12]  Anantha Chandrakasan,et al.  JouleTrack: a web based tool for software energy profiling , 2001, DAC '01.

[13]  James Laudon,et al.  System overview of the SGI Origin 200/2000 product line , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.

[14]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[15]  Aviral Shrivastava,et al.  An efficient compiler technique for code size reduction using reduced bit-width ISAs , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[16]  Marco Zagha,et al.  OriginTM 2000 and Onyx2® Performance Tuning and Optimization Guide , 1993 .

[17]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[18]  Mark C. Johnson,et al.  Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks , 1998, ISLPED '98.

[19]  Bob Francis,et al.  Silicon Graphics Inc. , 1993 .

[20]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[21]  Jan M. Rabaey,et al.  Digital Integrated Circuits: A Design Perspective , 1995 .

[22]  Anantha P. Chandrakasan,et al.  Low Power Digital CMOS Design , 1995 .

[23]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[24]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[25]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[26]  Mahmut T. Kandemir,et al.  Energy-driven integrated hardware-software optimizations using SimplePower , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[27]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[28]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[29]  C. Robert Morgan,et al.  Building an Optimizing Compiler , 1998 .

[30]  Michael E. Wolf,et al.  Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.

[31]  Rainer Leupers,et al.  Function inlining under code size constraints for embedded processors , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[32]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[33]  Mary Jane Irwin,et al.  Techniques for low energy software , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.

[34]  Gurindar S. Sohi,et al.  A static power model for architects , 2000, MICRO 33.

[35]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[36]  Mahmut T. Kandemir,et al.  Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[37]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[38]  Ken Kennedy,et al.  Procedure cloning , 1992, Proceedings of the 1992 International Conference on Computer Languages.

[39]  Ibrahim N. Hajj,et al.  Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[40]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .