A compiler approach for reducing data cache energy

Silicon technology advances have made it possible to pack millions of transistors --- switching at high clock speeds --- on a single chip. While these advances bring unprecedented performance to electronic products, they pose difficult power/energy consumption problems. For example, large number of transistors in dense on-chip cache memories consume significant static (leakage) power even if the cache is not used by the current computation. While previous compiler research studied code and data restructuring for improving data cache performance, to our knowledge, there is no compiler-based study that targets data cache leakage power consumption. In this paper, we present code restructuring techniques for array-based and pointer-intensive applications for reducing data cache energy consumption. The idea is to let the compiler to analyze the code and insert instructions that turn off cache lines that keep variables not used by the current computation. This turning off does not destroy contents of a cache line, and waking up the cache line incurs very little overhead. Due to data locality, we find that at a given time only a small portion of the data cache needs to be active; the remaining part can be placed into a leakage-saving mode (state); i.e., they can be turned off. Our preliminary results indicate that the proposed strategy reduces the cache energy consumption significantly. We also show that several compiler optimizations increase the effectiveness of our strategy.

[1]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[2]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[3]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[4]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[5]  Philip H. Sweany,et al.  Improving software pipelining with unroll-and-jam , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[6]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[7]  G. Sohi,et al.  A static power model for architects , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[8]  Mahmut T. Kandemir,et al.  Leakage energy management in cache hierarchies , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[9]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[10]  T. Mudge,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[11]  Eric Rotenberg,et al.  Adaptive mode control: A static-power-efficient cache design , 2003, TECS.

[12]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[13]  J. Ramanujam,et al.  Tiling multidimensional iteration spaces for nonshared memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[14]  Margaret Martonosi,et al.  Cache decay: exploiting generational behavior to reduce cache leakage power , 2001, ISCA 2001.

[15]  Wei Zhang,et al.  Exploiting VLIW schedule slacks for dynamic and leakage energy reduction , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[16]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[17]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[18]  Anne Rogers,et al.  Supporting dynamic data structures on distributed-memory machines , 1995, TOPL.

[19]  Soo-Mook Moon,et al.  Parallelizing nonnumerical code with selective scheduling and software pipelining , 1997, TOPL.

[20]  Tadahiro Kuroda,et al.  Threshold-Volgage control schemes through substrate-bias for low-power high-speed CMOS LSI design , 1996, J. VLSI Signal Process..

[21]  Kaushik Roy,et al.  Reducing leakage in a high-performance deep-submicron instruction cache , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[22]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[23]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, ISLPED '99.

[24]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[25]  Wei Zhang,et al.  Compiler-directed instruction cache leakage optimization , 2002, MICRO.

[26]  Vivek De,et al.  A new technique for standby leakage reduction in high-performance circuits , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[27]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[28]  Krste Asanovic,et al.  Dynamic fine-grain leakage reduction using leakage-biased bitlines , 2002, ISCA.

[29]  Scott Mahlke,et al.  Three Superblock Scheduling Models for Superscalar and Superpipelined Processors , 1991 .

[30]  Kemal Ebcioglu,et al.  VLIW compilation techniques in a superscalar environment , 1994, PLDI '94.

[31]  Jun Yang,et al.  Energy efficient Frequent Value data Cache design , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[32]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[33]  Trevor Mudge,et al.  Drowsy instruction caches. Leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[34]  Shin'ichiro Mutoh,et al.  1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS , 1995, IEEE J. Solid State Circuits.

[35]  J. Ramanujam,et al.  Optimal software pipelining of nested loops , 1994, Proceedings of 8th International Parallel Processing Symposium.

[36]  Monica S. Lam,et al.  RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .

[37]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[38]  T. Sakurai,et al.  A super cut-off CMOS (SCCMOS) scheme for 0.5-V supply voltage with picoampere stand-by current , 2000, IEEE Journal of Solid-State Circuits.

[39]  Michael F. P. O'Boyle,et al.  Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).