XDRA: Exploration and optimization of last-level cache for energy reduction in DDR DRAMs

Embedded systems with high energy consumption often exploit the idleness of DDR-DRAM to reduce their energy consumption by putting the DRAM into deepest low-power mode (self-refresh power down mode) during idle periods. DDR-DRAM idle periods heavily depend on the last-level cache. Exhaustive search using processor-memory simulators can take several months. This paper for first time proposes a fast framework called XDRA, which allows the exploration of last-level cache configurations to improve DDR-DRAM energy efficiency. XDRA combines a processor-memory simulator, a cache simulator and novel analysis techniques to produce a Kriging based estimator which predicts the energy savings for differing cache configurations for a given main memory size and application. Errors for the estimator were less than 4.4% on average for 11 applications from mediabench and SPEC2000 suite and two DRAM sizes (Micron DDR3-DRAM 256MB and 4GB). Cache configurations selected by XDRA were on average 3.6× and 4× more energy efficient (cache and DRAM energy) than a common cache configuration. Optimal cache configurations were selected by XDRA 20 times out of 22. The two suboptimal configurations were at most 3.9% from their optimal counterparts. XDRA took a few days for the exploration of 330 cache configurations compared to several hundred days of cycle-accurate simulations, saving at least 85% of exploration time.

[1]  Kang G. Shin,et al.  Improving energy efficiency by making DRAM less randomly accessed , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[2]  Martin Lukasiewycz,et al.  Modular system-level architecture for concurrent cell balancing , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Carla Schlatter Ellis,et al.  Memory controller policies for DRAM power management , 2001, ISLPED '01.

[4]  Mahmut T. Kandemir,et al.  Compiler-directed channel allocation for saving power in on-chip networks , 2006, POPL '06.

[5]  Yih-Lang Li,et al.  Routing congestion estimation with real design constraints , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Aamer Jaleel,et al.  DRAMsim: a memory system simulator , 2005, CARN.

[7]  Mahmut T. Kandemir,et al.  DRAM energy management using software and hardware directed power mode control , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8]  Siddharth Garg,et al.  HaDeS: Architectural synthesis for heterogeneous dark silicon chip multi-processors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[10]  Kees G. W. Goossens,et al.  Improved Power Modeling of DDR SDRAMs , 2011, 2011 14th Euromicro Conference on Digital System Design.

[11]  Vittorio Zaccaria,et al.  A correlation-based design space exploration methodology for multi-processor systems-on-chip , 2010, Design Automation Conference.

[12]  Yuan-Hao Chang,et al.  New ERA: New efficient reliability-aware wear leveling for endurance enhancement of flash storage devices , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[14]  Yu Zhang,et al.  An Approach for Adaptive DRAM Temperature and Power Management , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Nikil D. Dutt,et al.  Fast Configurable-Cache Tuning With a Unified Second-Level Cache , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[17]  Luca Benini,et al.  Energy-aware design of embedded memories: A survey of technologies, architectures, and optimization techniques , 2003, TECS.

[18]  Mahmut T. Kandemir,et al.  Design space exploration of workload-specific last-level caches , 2012, ISLPED '12.

[19]  Joonwon Lee,et al.  PABC: Power-Aware Buffer Cache Management for Low Power Consumption , 2007, IEEE Transactions on Computers.

[20]  Taewhan Kim,et al.  Memory access scheduling and binding considering energy minimization in multi-bank memory systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[21]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[22]  Alexander V. Veidenbaum,et al.  Improving SDRAM access energy efficiency for low-power embedded systems , 2008, TECS.

[23]  Sri Parameswaran,et al.  DEW: A fast level 1 cache simulation approach for embedded processors with FIFO replacement policy , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[24]  Barry L. Nelson,et al.  Response surface methodology for simulating hedging and trading strategies , 2008, 2008 Winter Simulation Conference.

[25]  A. Jaleel Memory Characterization of Workloads Using Instrumentation-Driven Simulation A Pin-based Memory Characterization of the SPEC CPU 2000 and SPEC CPU 2006 Benchmark Suites , 2022 .

[26]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[27]  Hai Wei,et al.  Rapid exploration of processing and design guidelines to overcome carbon nanotube variations , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Sanghamitra Roy,et al.  DMR3D: Dynamic Memory Relocation in 3D Multicore Systems , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Bernard Dieny,et al.  Non-volatile FPGAs based on spintronic devices , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[30]  Xianfeng Li,et al.  Design space exploration of caches using compressed traces , 2004, ICS '04.

[31]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[32]  Kees G. W. Goossens,et al.  A Predictor-Based Power-Saving Policy for DRAM Memories , 2012, 2012 15th Euromicro Conference on Digital System Design.

[33]  Zeshan Chishti,et al.  Rank-aware cache replacement and write buffering to improve DRAM energy efficiency , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[34]  Sri Parameswaran,et al.  Realizing Cycle Accurate Processor Memory Simulation via Interface Abstraction , 2011, 2011 24th Internatioal Conference on VLSI Design.

[35]  Zhen Wang,et al.  Hierarchical decoding of double error correcting codes for high speed reliable memories , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[36]  Zhao Zhang,et al.  DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.