Reducing data cache leakage energy using a compiler-based approach

Silicon technology advances have made it possible to pack millions of transistors---switching at high clock speeds---on a single chip. While these advances bring unprecedented performance to electronic products, they also pose difficult power/energy consumption problems. For example, large number of transistors in dense on-chip cache memories consume significant static (leakage) power even if the cache is not used by the current computation. While previous compiler research studied code and data restructuring for improving data cache performance, to our knowledge, there exists no compiler-based study that targets data cache leakage power consumption. In this paper, we present code restructuring techniques for array-based and pointer-intensive applications for reducing data cache leakage energy consumption. The idea is to let the compiler analyze the application code and insert instructions that turn off cache lines that keep variables not used by the current computation. This turning-off does not destroy contents of a cache line and waking up the cache line (when it is accessed later) does not incur much overhead. Due to inherent data locality in applications, we find that, at a given time, only a small portion of the data cache needs to be active; the remaining part can be placed into a leakage-saving mode (state); i.e., they can be turned off. Our experimental results indicate that the proposed compiler-based strategy reduces the cache energy consumption significantly. We also demonstrate how different compiler optimizations can increase the effectiveness of our strategy.

[1]  Wei Zhang,et al.  Compiler-directed instruction cache leakage optimization , 2002, MICRO.

[2]  Andreas Moshovos,et al.  Low-leakage asymmetric-cell SRAM , 2002, ISLPED '02.

[3]  Mahmut T. Kandemir A compiler technique for improving whole-program locality , 2001, POPL '01.

[4]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[5]  Kemal Ebcioglu,et al.  VLIW compilation techniques in a superscalar environment , 1994, PLDI '94.

[6]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[7]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[8]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[9]  J. Ramanujam,et al.  Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[10]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[11]  Kaushik Roy,et al.  Reducing leakage in a high-performance deep-submicron instruction cache , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[12]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[13]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[14]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[15]  Kaushik Roy,et al.  DRG-cache: a data retention gated-ground cache for low power , 2002, DAC '02.

[16]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[17]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[18]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[19]  Mahmut T. Kandemir,et al.  Leakage energy management in cache hierarchies , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[20]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[21]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[22]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[23]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[24]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[25]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[26]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[27]  Michael F. P. O'Boyle,et al.  Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[28]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[29]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[30]  Soo-Mook Moon,et al.  Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.

[31]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).