Energy-Efficient and SLA-Based Resource Management in Cloud Data Centers

Abstract Nowadays, cloud data centers play an important role in modern Information Technology (IT) infrastructures, being progressively adopted in different scenarios. The proliferation of cloud has led companies and resource providers to build large warehouse-sized data centers, in an effort to respond to costumers demand for computing resources. Operating with powerful data centers requires a significant amount of electrical power, which translates into more heat to dissipate, possible thermal imbalances, and increased electricity bills. On the other hand, as data centers grow in size and in complexity, failure events become norms instead of exceptions. However, failures contribute to the energy waste as well, since preceding work of terminated tasks is lost. Therefore, today's cloud data centers are faced with the challenge of reducing operational costs through improved energy utilization while provisioning dependable service to customers. This chapter discusses the causes of power and energy consumption in data centers. The advantages brought by cloud computing on the management of data center resources are discussed, and the state of the art on schemes and strategies to improve power and energy efficiency of computing resources is reviewed. A practical case of energy-efficient and service-level agreement (SLA)-based management of resources, which analyzes and discusses the performance of three state-of-the-art scheduling algorithms to improve energy efficiency, is also included. This chapter concludes with a review of open challenges on strategies to improve power and energy efficiency in data centers.

[1]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[2]  H. Howie Huang,et al.  TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  Michela Meo,et al.  Probabilistic Consolidation of Virtual Machines in Self-Organizing Cloud Data Centers , 2013, IEEE Transactions on Cloud Computing.

[4]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[5]  Zhiling Lan,et al.  Exploit failure prediction for adaptive fault-tolerance in cluster computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[6]  Michael Franz,et al.  Power reduction techniques for microprocessor systems , 2005, CSUR.

[7]  Sanjay Chaudhary,et al.  Performance isolation and scheduler behavior , 2010, 2010 First International Conference On Parallel, Distributed and Grid Computing (PDGC 2010).

[8]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[9]  Jeffrey S. Vetter,et al.  IPMI-based Efficient Notification Framework for Large Scale Cluster Computing , 2006 .

[10]  Amin Vahdat,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2007, 2009 3rd International Conference on New Technologies, Mobility and Security.

[11]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[12]  David Atienza,et al.  Free cooling-aware dynamic power management for green datacenters , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[13]  Jorge G. Barbosa,et al.  Towards high-available and energy-efficient virtual computing environments in the cloud , 2014, Future Gener. Comput. Syst..

[14]  Massoud Pedram,et al.  Temperature-aware dynamic resource provisioning in a power-optimized datacenter , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[15]  Evaluate The Rise of Games and High Performance Computing for Modeling and Simulation , 2010 .

[16]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[17]  Alexandru Iosup,et al.  Analysis and modeling of time-correlated failures in large-scale distributed systems , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[18]  Christoforos E. Kozyrakis,et al.  A Comparison of High-Level Full-System Power Models , 2008, HotPower.

[19]  Pankesh Patel,et al.  Service Level Agreement in Cloud Computing , 2009 .

[20]  Radu Prodan,et al.  PIASA: A power and interference aware resource management strategy for heterogeneous workloads in cloud data centers , 2015, Simul. Model. Pract. Theory.

[21]  Cheng-Zhong Xu,et al.  Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[22]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.

[23]  Xianghua Xu,et al.  Performance Evaluation of the CPU Scheduler in XEN , 2008, 2008 International Symposium on Information Science and Engineering.

[24]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[25]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[26]  Shin Gyu Kim,et al.  Virtual machine scheduling for multicores considering effects of shared on-chip last level cache interference , 2012, 2012 International Green Computing Conference (IGCC).

[27]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[28]  Jing Xu,et al.  Multi-Objective Virtual Machine Placement in Virtualized Data Center Environments , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[29]  Laurent Lefèvre,et al.  The Green Grid’5000: Instrumenting and Using a Grid with Energy Sensors , 2012 .

[30]  Cheng-Zhong Xu,et al.  Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[31]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[32]  Jie Xu,et al.  Improved energy-efficiency in cloud datacenters with interference-aware virtual machine placement , 2013, 2013 IEEE Eleventh International Symposium on Autonomous Decentralized Systems (ISADS).

[33]  Miltos Petridis,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2009, 2009 3rd International Conference on New Technologies, Mobility and Security.

[34]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[35]  Hong Ong,et al.  System-Level Virtualization for High Performance Computing , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[36]  Xing Pu,et al.  Performance Measurements and Analysis of Network I/O Applications in Virtualized Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[37]  Jie Xu,et al.  An Analysis of Failure-Related Energy Waste in a Large-Scale Cloud Environment , 2014, IEEE Transactions on Emerging Topics in Computing.

[38]  Albert Y. Zomaya,et al.  Energy efficient utilization of resources in cloud computing systems , 2010, The Journal of Supercomputing.

[39]  Fabien Hermenier,et al.  Power Management in Grid Computing with Xen , 2006, ISPA Workshops.

[40]  Rajkumar Buyya,et al.  SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter , 2014, J. Netw. Comput. Appl..

[41]  Tajana Simunic,et al.  vGreen: a system for energy efficient computing in virtualized environments , 2009, ISLPED.

[42]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[43]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[44]  Laurent Lefèvre,et al.  A survey on techniques for improving the energy efficiency of large-scale distributed systems , 2014, ACM Comput. Surv..

[45]  Dmytro Dyachuk,et al.  Maximizing Cloud Providers' Revenues via Energy Aware Allocation Policies , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[46]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[47]  Calton Pu,et al.  An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[48]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[49]  Jinkyun Cho,et al.  Evaluation of air management system's thermal performance for superior cooling efficiency in high-de , 2011 .

[50]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[51]  J. Koomey Worldwide electricity used in data centers , 2008 .

[52]  Martin L. Shooman,et al.  Reliability of Computer Systems and Networks: Fault Tolerance,Analysis,and Design , 2002 .

[53]  Diwakar Krishnamurthy,et al.  A Model of Storage I/O Performance Interference in Virtualized Systems , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[54]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[55]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[56]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[57]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[58]  Sanjay Chaudhary,et al.  Application Performance Isolation in Virtualization , 2009, 2009 IEEE International Conference on Cloud Computing.

[59]  Calton Pu,et al.  Who Is Your Neighbor: Net I/O Performance Interference in Virtualized Clouds , 2013, IEEE Transactions on Services Computing.

[60]  H. Casanova,et al.  Accuracy and Responsiveness of CPU Sharing Using Xen's Cap Values , 2008 .

[61]  Ravishankar K. Iyer,et al.  Checkpointing virtual machines against transient errors , 2010, 2010 IEEE 16th International On-Line Testing Symposium.

[62]  Jie Xu,et al.  Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud , 2014, IEEE Transactions on Cloud Computing.

[63]  Felix Salfner,et al.  Timely Virtual Machine Migration for Pro-active Fault Tolerance , 2011, 2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.

[64]  Song Fu,et al.  Failure-aware resource management for high-availability computing clusters with distributed virtual machines , 2010, J. Parallel Distributed Comput..

[65]  Vladimir Stantchev,et al.  Negotiating and Enforcing QoS and SLAs in Grid and Cloud Computing , 2009, GPC.

[66]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[67]  Qian Zhu,et al.  A Performance Interference Model for Managing Consolidated Workloads in QoS-Aware Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[68]  Xi He,et al.  Power-aware scheduling of virtual machines in DVFS-enabled clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[69]  Erol Gelenbe,et al.  Energy-Efficient Cloud Computing , 2010, Comput. J..

[70]  Qian Zhu,et al.  Power-Aware Consolidation of Scientific Workflows in Virtualized Environments , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[71]  Liang Liu,et al.  Energy efficient scheduling of virtual machines in cloud with deadline constraint , 2015, Future Gener. Comput. Syst..

[72]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[73]  André Brinkmann,et al.  eScience Cloud Infrastructure , 2011, 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications.

[74]  Heinz W. Schmidt,et al.  Architecture-based fault tolerance support for grid applications , 2011, QoSA-ISARCS '11.

[75]  Bran Selic,et al.  A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.

[76]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[77]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[78]  Heinz W. Schmidt,et al.  Evaluating recovery aware components for grid reliability , 2009, ESEC/FSE '09.

[79]  Gargi Dasgupta,et al.  Server Workload Analysis for Power Minimization using Consolidation , 2009, USENIX Annual Technical Conference.

[80]  Louise E. Moser,et al.  Fault Tolerance Middleware for Cloud Computing , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[81]  Kevin Skadron,et al.  PRECISELY PREDICTING PERFORMANCE DEGRADATION DUE TO COLOCATING MULTIPLE EXECUTING APPLICATIONS ON A SINGLE MACHINE IS CRITICAL FOR IMPROVING UTILIZATION IN MODERN , 2012 .

[82]  Christian Engelmann,et al.  Proactive Fault Tolerance Using Preemptive Migration , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[83]  Jorge G. Barbosa,et al.  Optimizing Energy-Efficiency in High-Available Scientific Cloud Environments , 2013, 2013 International Conference on Cloud and Green Computing.

[84]  Hossein Deldari,et al.  Improving cluster computing performance based on job futurity prediction , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[85]  Patrick P. C. Lee,et al.  An experimental study of cascading performance interference in a virtualized environment , 2013, PERV.

[86]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[87]  Luiz André Barroso,et al.  The Price of Performance , 2005, ACM Queue.

[88]  Jorge G. Barbosa,et al.  Dynamic Power- and Failure-Aware Cloud Resources Allocation for Sets of Independent Tasks , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[89]  Heiko Ludwig,et al.  The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services , 2003, Journal of Network and Systems Management.

[90]  Brian D. Noble,et al.  Exploiting Availability Prediction in Distributed Systems , 2006, NSDI.

[91]  Heinz W. Schmidt,et al.  Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings , 2013, CBSE '13.

[92]  Laurent Lefèvre,et al.  Designing and evaluating an energy efficient Cloud , 2010, The Journal of Supercomputing.

[93]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[94]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[95]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[96]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[97]  Thomas Erl,et al.  Service-Oriented Architecture: Concepts, Technology, and Design , 2005 .

[98]  L. W. Comeau,et al.  A VIRTUAL MACHINE SYSTEM FOR THE 360/40 , 1966 .

[99]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[100]  Anand Sivasubramaniam,et al.  Fault-aware job scheduling for BlueGene/L systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[101]  Said Mirza Pahlevi,et al.  Editorial: A Special Issue from the Open Grid Forum , 2009 .

[102]  Jing Xu,et al.  A multi-objective approach to virtual machine management in datacenters , 2011, ICAC '11.

[103]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[104]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[105]  Alexandru Stan,et al.  Autonomous Management of Virtual Machine Failures in IaaS Using Fault Tree Analysis , 2014, GECON.

[106]  Rajkumar Buyya,et al.  Energy and Carbon-Efficient Placement of Virtual Machines in Distributed Cloud Data Centers , 2013, Euro-Par.

[107]  Alain Delchambre,et al.  A genetic algorithm for bin packing and line balancing , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[108]  Eli M. Dow,et al.  Leveraging virtualization to optimize high-availability system configurations , 2008, IBM Syst. J..

[109]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[110]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.