Lifetime or energy: Consolidating servers with reliability control in virtualized cloud datacenters

Server consolidation using virtualization technologies allow cloud-scale datacenters to improve resource utilization and energy efficiency. However, most existing consolidation strategies solely focused on balancing the tradeoff between service-level-agreements (SLAs) desired by cloud applications and energy costs consumed by hosting servers. With the presence of fluctuating workloads in datacenters, the lifetime and reliability of servers under dynamic power-aware consolidation could be adversely impacted by repeated on-off thermal cycles, wear-and-tear and temperature rise. In this paper, we propose a Reliability-Aware server Consolidation stratEgy, named RACE, to address when and how to perform energy-efficient server consolidation in a reliability-friendly and profitable way. The focus is on the characterization and analysis of this problem as a multi-objective optimization, by developing an utility model that unifies multiple constraints on performance SLAs, reliability factors, and energy costs in a holistic manner. An improved grouping genetic algorithm is proposed to search the global optimal solution, which takes advantage of a collection of reliability-aware resource buffering, and virtual machines-to-servers re-mapping heuristics for generating good initial solutions and improving the convergence rate. Extensive simulations are conducted to validate the effectiveness, scalability and overhead of RACE in improving the overall utility of datacenters while avoiding unprofitable consolidation in the long term - compared with pMapper and PADD strategies for server consolidation.

[1]  Jose Renau,et al.  Characterizing processor thermal behavior , 2010, ASPLOS 2010.

[2]  David A. Patterson,et al.  A Case For Adaptive Datacenters To Conserve Energy and Improve Reliability , 2008 .

[3]  Navendu Jain,et al.  Managing cost, performance, and reliability tradeoffs for energy-aware server provisioning , 2011, 2011 Proceedings IEEE INFOCOM.

[4]  Freeman L. Rawson,et al.  PADD: Power Aware Domain Distribution , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[5]  Hai Jin,et al.  Performance and energy modeling for live migration of virtual machines , 2011, Cluster Computing.

[6]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[7]  Hai Jin,et al.  Performance and energy modeling for live migration of virtual machines , 2011, HPDC.

[8]  P. W. Hale,et al.  Acceleration and time to fail , 1986 .

[9]  Yao Sun,et al.  Sacrificing Reliability for Energy Saving: Is it worthwhile for disk arrays? , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  Lachlan L. H. Andrew,et al.  Dynamic Right-Sizing for Power-Proportional Data Centers , 2011, IEEE/ACM Transactions on Networking.

[11]  竹安 数博,et al.  Time series analysis and its applications , 2007 .

[12]  David A. Maltz,et al.  Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.

[13]  Jose Renau,et al.  Characterizing processor thermal behavior , 2010, ASPLOS XV.

[14]  Bianca Schroeder,et al.  Temperature management in data centers: why some (might) like it hot , 2012, SIGMETRICS '12.

[15]  Gargi Dasgupta,et al.  Server Workload Analysis for Power Minimization using Consolidation , 2009, USENIX Annual Technical Conference.

[16]  David S. Stoffer,et al.  Time series analysis and its applications , 2000 .

[17]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[18]  Qian Zhu,et al.  Power-Aware Consolidation of Scientific Workflows in Virtualized Environments , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[19]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[20]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[21]  Akshat Verma,et al.  Power-aware dynamic placement of HPC applications , 2008, ICS '08.

[22]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[23]  Calton Pu,et al.  Mistral: Dynamically Managing Power, Performance, and Adaptation Cost in Cloud Infrastructures , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[24]  John V. Guttag,et al.  Power-demand routing in massive geo-distributed systems , 2010 .