Leakage energy estimates for HPC applications

Large-scale high-performance systems are energy constrained. With thousands of processing cores at their disposal, these machines contain large amounts of on-chip caches. With a trend of decreasing thresholds in transistors, the amount of leakage current and energy losses has increased dramatically. Coupling the two trends, on-chip caches are responsible for a large portion of total leakage energy losses. In this work, we quantify the on-chip leakage energy losses across a wide set of applications. Our scheme profiles applications to measure cache accesses in order to estimate energy consumption across various levels of caches. Our study indicates that the leakage energy is the dominant form of energy dissipation in on-chip caches and may account for up to 80% of total cache energy, and this trend is expected to increase with every new generation of semiconductor process. Our results also suggest that compiler optimizations have a very limited effect on the total energy consumption of the caches and irrespective of the compiler optimizations, the problem of leakage in caches cannot be effectively addressed by software techniques but requires intervention at circuit and architectural levels. The problem of leakage in caches cannot be neglected in attacking the energy barrier to building exascale systems.

[1]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[2]  Simon Segars Low power design techniques for microprocessors , 2000 .

[3]  Kaushik Roy,et al.  An integrated circuit/architecture approach to reducing leakage in deep-submicron high-performance I-caches , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[4]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[5]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[6]  James Demmel,et al.  ScaLAPACK: A Linear Algebra Library for Message-Passing Computers , 1997, PPSC.

[7]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[8]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[9]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[10]  Sally A. McKee,et al.  Portable, scalable, per-core power estimation for intelligent resource management , 2010, International Conference on Green Computing.

[11]  William J. Bowhill,et al.  Design of High-Performance Microprocessor Circuits , 2001 .

[12]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[13]  Michel Dubois,et al.  Controlling leakage power with the replacement policy in slumberous caches , 2005, CF '05.

[14]  David Blaauw,et al.  Drowsy caches: simple techniques for reducing leakage power , 2002, ISCA.

[15]  David Blaauw,et al.  Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction , 2002, MICRO.

[16]  Babak Falsafi,et al.  Toward Dark Silicon in Servers , 2011, IEEE Micro.

[17]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[18]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[19]  Mahmut T. Kandemir,et al.  vEC: virtual energy counters , 2001, PASTE '01.

[20]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[21]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[22]  Lizy Kurian John,et al.  Complete System Power Estimation Using Processor Performance Events , 2012, IEEE Transactions on Computers.

[23]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[24]  Bill Moyer,et al.  A low power unified cache architecture providing power and performance flexibility , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[25]  Shirley Moore,et al.  Measuring Energy and Power with PAPI , 2012, 2012 41st International Conference on Parallel Processing Workshops.