DUAL: Reliability-Aware Power Management in Data Centers

A virtualized data center hosts users and applications within a large number of virtual machines (VM) to achieve easy provisioning and high utilization of physical resources. Energy efficiency and reliability are two primary concerns for operating a data center. Power saving techniques, such as dynamic voltage and frequency scaling (DVFS), are often employed to reduce the supply voltages of the CPUs in runtime when the computer system utilization is low. However, DVFS can potentially decrease the system reliability - the processors at low voltages are more likely to encounter soft errors that may result in VM or system crashes. In this work, we propose a data center management framework, DUAL, which consists of the new virtual machine power and reliability analysis tools. The framework is designed to balance the dual needs of a data center: reducing energy consumption and providing high reliability. The evaluations show that DUAL can help maintain the desired reliability and significantly reduce power consumption, which in turn will lower the overall operational cost of a data center.

[1]  E. Normand Single event upset at ground level , 1996 .

[2]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.

[3]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  Xin Xu,et al.  Understanding soft error propagation using Efficient vulnerability-driven fault injection , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[5]  Dakai Zhu,et al.  Reliability-aware Dynamic Voltage Scaling for energy-constrained real-time embedded systems , 2008, 2008 IEEE International Conference on Computer Design.

[6]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Shekhar Y. Borkar,et al.  Microarchitecture and Design Challenges for Gigascale Integration , 2004, MICRO.

[8]  Sharad Malik,et al.  Instruction level power analysis and optimization of software , 1996, Proceedings of 9th International Conference on VLSI Design.

[9]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[10]  Russell Tessier,et al.  Multicore soft error rate stabilization using adaptive dual modular redundancy , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[11]  Antonia Zhai,et al.  Enabling improved power management in multicore processors through clustered DVFS , 2010, 2011 Design, Automation & Test in Europe.

[12]  Xiao Qin,et al.  Improving reliability of energy-efficient parallel storage systems by disk swapping , 2009, 2009 IEEE 28th International Performance Computing and Communications Conference.

[13]  Alan Wood,et al.  The impact of new technology on soft error rates , 2011, 2011 International Reliability Physics Symposium.

[14]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[15]  Feng Zhao,et al.  Virtual machine power metering and provisioning , 2010, SoCC '10.

[16]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[17]  Aamer Jaleel,et al.  Explaining cache SER anomaly using DUE AVF measurement , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[18]  Jie Liu,et al.  Power Budgeting for Virtualized Data Centers , 2011, USENIX Annual Technical Conference.

[19]  Xi He,et al.  Power-aware scheduling of virtual machines in DVFS-enabled clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[20]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.