CASHIER: A Cache Energy Saving Technique for QoS Systems

With each CMOS technology generation, leakage energy has been increasing at an exponential rate and hence, managing the energy consumption of large, last-level caches is becoming a critical research issue in modern chip design. Saving cache energy in QoS systems is especially challenging, since, to avoid missing deadlines, a suitable balance needs to be made between energy saving and performance loss. We present CASHIER, a Cache Energy Saving Technique for Quality of Service Systems. Cashier uses dynamic profiling to estimate the memory subsystem energy and execution time of the program under multiple last level cache (LLC) configurations. It then reconfigures LLC to an energy efficient configuration with a view to meet the deadline. In QoS systems, allowed slack may be specified either as percentage of baseline execution time or as absolute slack and Cashier can work for both these cases. The experiments show the effectiveness of Cashier in saving cache energy. For example, for an L2 cache size of 2MB and 5% allowed-slack over baseline, the average saving in memory subsystem energy by using Cashier is 23.6%.

[1]  Zhao Zhang,et al.  Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[2]  Rajesh K. Gupta,et al.  Dynamic slack reclamation with procrastination scheduling in real-time embedded systems , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[3]  Kang G. Shin,et al.  Real-time dynamic voltage scaling for low-power embedded operating systems , 2001, SOSP.

[4]  Meikang Qiu,et al.  Real-Time Constrained Task Scheduling in 3D Chip Multiprocessor to Reduce Peak Temperature , 2010, 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing.

[5]  J. Ticehurst Cacti , 1983 .

[6]  Douglas C. Schmidt,et al.  Toward Effective Multi-Capacity Resource Allocation in Distributed Real-Time and Embedded Systems , 2008, 2008 11th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC).

[7]  Babak Falsafi,et al.  Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[8]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[9]  Klara Nahrstedt,et al.  Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003, SOSP '03.

[10]  P. P. White,et al.  RSVP and integrated services in the Internet: a tutorial , 1997, IEEE Commun. Mag..

[11]  Frank Bellosa,et al.  Process cruise control: event-driven clock scaling for dynamic power management , 2002, CASES '02.

[12]  Zhao Zhang,et al.  EnCache: Improving Cache Energy Efficiency U sing a Software-Controlled Profiling Cache , 2012 .

[13]  Lizy Kurian John,et al.  Subsetting the SPEC CPU2006 benchmark suite , 2007, CARN.

[14]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[15]  Massoud Pedram,et al.  Off-chip latency-driven dynamic voltage and frequency scaling for an MPEG decoding , 2004, Proceedings. 41st Design Automation Conference, 2004..

[16]  Siddhartha Kumar Khaitan,et al.  A Class of New Preconditioners for Linear Solvers Used in Power System Time-Domain Simulation , 2010, IEEE Transactions on Power Systems.

[17]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[18]  Weixun Wang,et al.  Dynamic Reconfiguration of Two-Level Caches in Soft Real-Time Embedded Systems , 2009, 2009 IEEE Computer Society Annual Symposium on VLSI.

[19]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  James E. Smith,et al.  A performance counter architecture for computing accurate CPI components , 2006, ASPLOS XII.

[21]  Zhao Zhang,et al.  Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices , 2009, ISCA '09.

[22]  Jian-Jia Chen,et al.  Cache leakage control mechanism for hard real-time systems , 2007, CASES '07.

[23]  Vikas Agarwal,et al.  Static energy reduction techniques for microprocessor caches , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[24]  Houman Homayoun,et al.  Adaptive techniques for leakage power management in L2 cache peripheral circuits , 2008, 2008 IEEE International Conference on Computer Design.

[25]  Sparsh Mittal,et al.  BayWave: BAYesian WAVElet-based image estimation , 2009 .