Evaluating memory energy efficiency in parallel I/O workloads

Power consumption is an important issue for cluster supercomputers as it directly affects their running cost and cooling requirements. This paper investigates the memory energy efficiency of high-end data servers used for supercomputers. Emerging memory technologies allow memory devices to dynamically adjust their power states. To achieve maximum energy saving, the memory management on data servers needs to judiciously utilize these energy-aware devices. As we explore different management schemes under four real-world parallel I/O workloads, we find that the memory energy consumption is determined by a complex interaction among four important factors: (1) cache hit rates that may directly translate performance gain into energy saving, (2) cache populating schemes that perform buffer allocation and affect access locality at the chip level, (3) request clustering that aims to temporally align memory transfers from different buses into the same memory chips, and (4) access patterns in workloads that affect the first three factors.

[1]  Xiaodong Li,et al.  Performance directed energy management for main memory and disks , 2004, ASPLOS XI.

[2]  Dennis Shasha,et al.  2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm , 1994, VLDB.

[3]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[4]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  Ricardo Bianchini,et al.  Limiting the power consumption of main memory , 2007, ISCA '07.

[6]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[7]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[8]  Mahmut T. Kandemir,et al.  Power and performance in I/O for scientific applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  Yung-Hsiang Lu,et al.  Joint power management of memory and disk , 2005, Design, Automation and Test in Europe.

[10]  Song Jiang,et al.  LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance , 2002, SIGMETRICS '02.

[11]  Hong Jiang,et al.  CEFT: A cost-effective, fault-tolerant parallel virtual file system , 2006, J. Parallel Distributed Comput..

[12]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[13]  Yuanyuan Zhou,et al.  The Multi-Queue Replacement Algorithm for Second Level Buffer Caches , 2001, USENIX Annual Technical Conference, General Track.

[14]  Mahmut T. Kandemir,et al.  Scheduler-based DRAM energy management , 2002, DAC '02.

[15]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[16]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[17]  Kirk W. Cameron,et al.  Memory-miser: a performance-constrained runtime system for power-scalable clusters , 2007, CF '07.

[18]  Kirk W. Cameron,et al.  An Implementation of Page Allocation Shaping for Energy Efficiency , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[19]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[20]  Sang Lyul Min,et al.  On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies , 1999, SIGMETRICS '99.

[21]  Tyce T. McLarty,et al.  Parallel file system testing for the lunatic fringe: the care and feeding of restless I/O power users , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[22]  Song Jiang,et al.  CLOCK-Pro: An Effective Improvement of the CLOCK Replacement , 2005, USENIX Annual Technical Conference, General Track.

[23]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[24]  Kang G. Shin,et al.  Design and Implementation of Power-Aware Virtual Memory , 2003, USENIX ATC, General Track.

[25]  Dharmendra S. Modha,et al.  CAR: Clock with Adaptive Replacement , 2004, FAST.

[26]  Gerhard Weikum,et al.  The LRU-K page replacement algorithm for database disk buffering , 1993, SIGMOD Conference.

[27]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[28]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[29]  Nimrod Megiddo,et al.  ARC: A Self-Tuning, Low Overhead Replacement Cache , 2003, FAST.

[30]  Yuanyuan Zhou,et al.  Power-aware storage cache management , 2005, IEEE Transactions on Computers.

[31]  Evgenia Smirni,et al.  Power-aware resource allocation in high-end systems via online simulation , 2005, ICS '05.

[32]  Mahmut T. Kandemir,et al.  Automatic data migration for reducing energy consumption in multi-bank memory systems , 2002, DAC '02.