Adaptive memory power management techniques for HPC workloads

The memory subsystem is responsible for a large fraction of the energy consumed by compute nodes in High Performance Computing (HPC) systems. The rapid increase in the number of cores has been accompanied by a corresponding increase in the DRAM capacity and bandwidth, and as a result, the memory system consumes a significant amount of the power budget available to a compute node. Consequently, there is a broad research effort focused on power management techniques using DRAM low-power modes. However, memory power management continues to present many challenges. In this paper, we study the potential of Dynamic Voltage and Frequency Scaling (DVFS) of the memory subsystems, and consider the ability to select different frequencies for different memory channels. Our approach is based on tuning voltage and frequency dynamically to maximize the energy savings while maintaining performance degradation within tolerable limits. We assume that HPC applications do not demand maximum bandwidth throughout the entire period of execution. We can use these low memory demand intervals to tune down the frequency and, as a result, applications can tolerate a reduction in bandwidth to save energy. In this paper, we study application channel access patterns, and use these patterns to determine potential additional energy savings that can be achieved by accordingly controlling the channels independently. We then evaluate the proposed DVFS algorithm using a novel hybrid evaluation methodology that includes simulation as well as executions on real hardware. Our results demonstrate the large potential of adaptive memory power management techniques based on DVFS for HPC workloads.

[1]  Manish Parashar,et al.  Investigating the potential of application-centric aggressive power management for HPC workloads , 2010, 2010 International Conference on High Performance Computing.

[2]  Xiaodong Li,et al.  Cross-component energy management: Joint adaptation of processor and memory , 2007, TACO.

[3]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[4]  Kim M. Hazelwood,et al.  Scalable support for multithreaded applications on dynamic binary instrumentation systems , 2009, ISMM '09.

[5]  Mahmut T. Kandemir,et al.  Automatic data migration for reducing energy consumption in multi-bank memory systems , 2002, DAC '02.

[6]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Calvin Lin,et al.  A comprehensive approach to DRAM power management , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[8]  Norman P. Jouppi,et al.  Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.

[9]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[10]  Ricardo Bianchini,et al.  Limiting the power consumption of main memory , 2007, ISCA '07.

[11]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[12]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[13]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[14]  José González,et al.  Meeting points: Using thread criticality to adapt multicore hardware to parallel regions , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[15]  Mahmut T. Kandemir,et al.  Scheduler-based DRAM energy management , 2002, DAC '02.

[16]  Zhao Zhang,et al.  Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices , 2009, ISCA '09.

[17]  Naehyuck Chang,et al.  Memory-aware energy-optimal frequency assignment for dynamic supply voltage scaling , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[18]  Rami G. Melhem,et al.  Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[19]  Mahmut T. Kandemir,et al.  Hardware and Software Techniques for Controlling DRAM Power Modes , 2001, IEEE Trans. Computers.

[20]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[21]  Mahmut T. Kandemir,et al.  Energy-oriented compiler optimizations for partitioned memory architectures , 2000, CASES '00.

[22]  Aamer Jaleel,et al.  CMPSched$im: Evaluating OS/CMP interaction on shared cache management , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[23]  Xiaodong Li,et al.  Performance directed energy management for main memory and disks , 2004, ASPLOS XI.

[24]  Hanene Ben Fradj,et al.  System Level Multi-bank Main Memory Configuration for Energy Reduction , 2006, PATMOS.

[25]  B. Jacob,et al.  CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .

[26]  Michael C. Huang,et al.  Positional adaptation of processors: application to energy reduction , 2003, ISCA '03.

[27]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[28]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[29]  Hanene Ben Fradj,et al.  Multi-Bank Main Memory Architecture with Dynamic Voltage Frequency Scaling for System Energy Optimization , 2006, 9th EUROMICRO Conference on Digital System Design (DSD'06).

[30]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).