Towards high-available and energy-efficient virtual computing environments in the cloud

Empowered by virtualisation technology, cloud infrastructures enable the construction of flexible and elastic computing environments, providing an opportunity for energy and resource cost optimisation while enhancing system availability and achieving high performance. A crucial requirement for effective consolidation is the ability to efficiently utilise system resources for high-availability computing and energy-efficiency optimisation to reduce operational costs and carbon footprints in the environment. Additionally, failures in highly networked computing systems can negatively impact system performance substantially, prohibiting the system from achieving its initial objectives. In this paper, we propose algorithms to dynamically construct and readjust virtual clusters to enable the execution of users' jobs. Allied with an energy optimising mechanism to detect and mitigate energy inefficiencies, our decision-making algorithms leverage virtualisation tools to provide proactive fault-tolerance and energy-efficiency to virtual clusters. We conducted simulations by injecting random synthetic jobs and jobs using the latest version of the Google cloud tracelogs. The results indicate that our strategy improves the work per Joule ratio by approximately 12.9% and the working efficiency by almost 15.9% compared with other state-of-the-art algorithms.

[1]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[2]  Christine Morin,et al.  Snooze: A Scalable, Fault-Tolerant and Distributed Consolidation Manager for Large-Scale Clusters , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[3]  Cheng-Zhong Xu,et al.  Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[4]  Lavanya Ramakrishnan,et al.  VGrADS: enabling e-Science workflows on grids and clouds with fault tolerance , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[5]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[6]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[7]  Jing Xu,et al.  Multi-Objective Virtual Machine Placement in Virtualized Data Center Environments , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  Xiaohong Jiang,et al.  Analyzing and Modeling the Performance in Xen-Based Virtual Cluster Environment , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[10]  Meeta Sharma Gupta,et al.  System level analysis of fast, per-core DVFS using on-chip switching regulators , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[11]  Sangyeun Cho,et al.  Characterizing Machines and Workloads on a Google Cluster , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[12]  G. Manimaran,et al.  An adaptive scheme for fault-tolerant scheduling of soft real-time tasks in multiprocessor systems , 2001, J. Parallel Distributed Comput..

[13]  Jorge G. Barbosa,et al.  Dynamic Power- and Failure-Aware Cloud Resources Allocation for Sets of Independent Tasks , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[14]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[15]  Hong Ong,et al.  System-Level Virtualization for High Performance Computing , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[16]  Kartik Gopalan,et al.  Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning , 2009, VEE '09.

[17]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[18]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[19]  C. Walck Hand-book on statistical distributions for experimentalists , 1996 .

[20]  Freeman L. Rawson,et al.  PADD: Power Aware Domain Distribution , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[21]  Tajana Simunic,et al.  vGreen: a system for energy efficient computing in virtualized environments , 2009, ISLPED.

[22]  田村 芳明,et al.  Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[23]  Henri Casanova,et al.  Dynamic Fractional Resource Scheduling versus Batch Scheduling , 2012, IEEE Transactions on Parallel and Distributed Systems.

[24]  André Brinkmann,et al.  eScience Cloud Infrastructure , 2011, 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications.

[25]  Sergiu Nedevschi,et al.  Reducing Network Energy Consumption via Sleeping and Rate-Adaptation , 2008, NSDI.

[26]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[27]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[28]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..

[29]  Amin Vahdat,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2007, 2009 3rd International Conference on New Technologies, Mobility and Security.

[30]  Louise E. Moser,et al.  Fault Tolerance Middleware for Cloud Computing , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[31]  Jie Xu,et al.  An Approach for Characterizing Workloads in Google Cloud to Derive Realistic Resource Utilization Models , 2013, 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering.

[32]  Emmanuel Jeannot,et al.  On the distribution of sequential jobs in random brokering for heterogeneous computational grids , 2006, IEEE Transactions on Parallel and Distributed Systems.

[33]  Erol Gelenbe,et al.  Energy-Efficient Cloud Computing , 2010, Comput. J..

[34]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[35]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[36]  Ivona Brandic,et al.  Revealing the MAPE loop for the autonomic management of Cloud infrastructures , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[37]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[38]  Ravishankar K. Iyer,et al.  Checkpointing virtual machines against transient errors , 2010, 2010 IEEE 16th International On-Line Testing Symposium.

[39]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[40]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[41]  Eli M. Dow,et al.  Leveraging virtualization to optimize high-availability system configurations , 2008, IBM Syst. J..

[42]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[43]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[44]  Xi He,et al.  Power-aware scheduling of virtual machines in DVFS-enabled clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[45]  Qian Zhu,et al.  Power-Aware Consolidation of Scientific Workflows in Virtualized Environments , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[46]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[47]  Jorge G. Barbosa,et al.  Optimizing Energy-Efficiency in High-Available Scientific Cloud Environments , 2013, 2013 International Conference on Cloud and Green Computing.

[48]  Xiaomin Zhu,et al.  QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters , 2011, IEEE Transactions on Computers.

[49]  Saurabh Kumar,et al.  Energy Efficient Utilization of Resources in Cloud Computing Systems , 2016 .

[50]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[52]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[53]  Miltos Petridis,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2009, 2009 3rd International Conference on New Technologies, Mobility and Security.

[54]  Song Fu,et al.  Failure-aware resource management for high-availability computing clusters with distributed virtual machines , 2010, J. Parallel Distributed Comput..

[55]  G. Manimaran,et al.  An Adaptive Scheme for Fault-Tolerant Scheduling of Soft Real-Time Tasks in Multiprocessor Systems , 2001, HiPC.

[56]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[57]  Jing Xu,et al.  A multi-objective approach to virtual machine management in datacenters , 2011, ICAC '11.

[58]  Song Fu Failure-Aware Construction and Reconfiguration of Distributed Virtual Machines for High Availability Computing , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[59]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.