Energy- efficient and SLA-based management of IaaS Cloud Data Centers

Cloud computing is progressively being adopted in different scenarios by offering on-demand, flexible, and high-scalability access to large-scale distributed resources, with Service Level Agreements-driven management. Virtualization is the basic technology of cloud computing, rendering flexible and scalable system services to cloud systems. As these distributed systems become more widespread, companies and resource providers are building large warehouse-sized data centers to cope with increasing demand for computing resources. However, the amount of electrical energy consumed by data centers increases with the amount of computing power instaled. In the same line, as compute systems grow in size and in complexity, failure events become norm instead of exception, increasing the energy waste even more and affecting the Quality-of-Service of the system perceived by end-users. Moreover, current virtualization technologies do not provide performance isolation, meaning that two applications running in independent virtual machines can interfere in the execution of each other when they share the same physical server, hence violating the Quality-of-Service constraints. This thesis presents two improved mechanisms with the twofold objective of saving electrical costs and respecting the Service Level Agreements stipulated with users. The first objective is achieved by allying an energy optimizing mechanism to detect and mitigate energy inefficiencies, and virtualization tools to provide proactive fault-tolerance and energy efficiency to virtual clusters. Energy inefficiencies are reduced by dynamically consolidating virtual machines and switching off and on physical nodes according to resource demand. Consolidation is implemented based on vertical and horizontal elasticity of resources. The second objective is achieved by articulating a performance estimator mechanism to detect deviation from application Quality-of-Service requirements, and virtualization tools to adapt the map of virtual machines to servers. Two types of workloads are considered, namely CPUand network-bound workloads, with different Qualityof-Service constraints. The analysis of the performance of the proposed mechanisms is done via simulation and experiments in real cloud testbed. The workloads, failures, and performance characteristics used in tests are coherent with the attributes outlined in state-of-the-art studies over large-scale data centers. In the case of the first objective, the results indicate that the proposed strategy improves the work per Joule ratio by approximately 12.9% and the working efficiency by almost 15.9% compared with other state-of-the-art algorithms. For the second objective, the results show that the proposed performance enforcing mechanism is able to fulfil contracted SLAs of real-world environments, while reducing energy costs up to 21%.

[1]  Brian D. Noble,et al.  Exploiting Availability Prediction in Distributed Systems , 2006, NSDI.

[2]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[3]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[4]  L. W. Comeau,et al.  A VIRTUAL MACHINE SYSTEM FOR THE 360/40 , 1966 .

[5]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[6]  Daeyong Jung,et al.  An Effective Job Replication Technique Based on Reliability and Performance in Mobile Grids , 2010, GPC.

[7]  Heinz W. Schmidt,et al.  Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings , 2013, CBSE '13.

[8]  Anand Sivasubramaniam,et al.  Fault-aware job scheduling for BlueGene/L systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..

[10]  Zhiling Lan,et al.  Exploit failure prediction for adaptive fault-tolerance in cluster computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[11]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[12]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[13]  Jordi Torres,et al.  Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement , 2008, Middleware.

[14]  Jorge G. Barbosa,et al.  A performance enforcing mechanism for energy- and failure-aware cloud systems , 2014, International Green Computing Conference.

[15]  Ricardo Lent Analysis of an energy proportional data center , 2015, Ad Hoc Networks.

[16]  Louise E. Moser,et al.  Fault Tolerance Middleware for Cloud Computing , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[17]  Jorge G. Barbosa,et al.  Estimating Effective Slowdown of Tasks in Energy-Aware Clouds , 2014, 2014 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[18]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[19]  Sanjay Chaudhary,et al.  Application Performance Isolation in Virtualization , 2009, 2009 IEEE International Conference on Cloud Computing.

[20]  Calton Pu,et al.  Who Is Your Neighbor: Net I/O Performance Interference in Virtualized Clouds , 2013, IEEE Transactions on Services Computing.

[21]  H. Casanova,et al.  Accuracy and Responsiveness of CPU Sharing Using Xen's Cap Values , 2008 .

[22]  André Brinkmann,et al.  Enforcing SLAs in Scientific Clouds , 2010, 2010 IEEE International Conference on Cluster Computing.

[23]  Kevin Skadron,et al.  PRECISELY PREDICTING PERFORMANCE DEGRADATION DUE TO COLOCATING MULTIPLE EXECUTING APPLICATIONS ON A SINGLE MACHINE IS CRITICAL FOR IMPROVING UTILIZATION IN MODERN , 2012 .

[24]  Rajkumar Buyya,et al.  SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter , 2014, J. Netw. Comput. Appl..

[25]  Tajana Simunic,et al.  vGreen: a system for energy efficient computing in virtualized environments , 2009, ISLPED.

[26]  Christian Engelmann,et al.  Proactive Fault Tolerance Using Preemptive Migration , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[27]  Laurent Lefèvre,et al.  Designing and evaluating an energy efficient Cloud , 2010, The Journal of Supercomputing.

[28]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[29]  Henri Casanova,et al.  Dynamic Fractional Resource Scheduling versus Batch Scheduling , 2012, IEEE Transactions on Parallel and Distributed Systems.

[30]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[31]  Xing Pu,et al.  Performance Analysis of Network I/O Workloads in Virtualized Data Centers , 2013, IEEE Transactions on Services Computing.

[32]  Chita R. Das,et al.  Characterizing Network Traffic in a Cluster-based, Multi-tier Data Center , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[33]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[34]  Rajkumar Buyya,et al.  Power-aware provisioning of Cloud resources for real-time services , 2009, MGC '09.

[35]  Kento Aida,et al.  Evaluation of Performance Degradation in HPC Applications with VM Consolidation , 2012, 2012 Third International Conference on Networking and Computing.

[36]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[37]  Jasbir S. Arora,et al.  Survey of multi-objective optimization methods for engineering , 2004 .

[38]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[39]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[40]  André Brinkmann,et al.  eScience Cloud Infrastructure , 2011, 2011 37th EUROMICRO Conference on Software Engineering and Advanced Applications.

[41]  David Atienza,et al.  Free cooling-aware dynamic power management for green datacenters , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[42]  Jorge G. Barbosa,et al.  Towards high-available and energy-efficient virtual computing environments in the cloud , 2014, Future Gener. Comput. Syst..

[43]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[44]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[45]  Dick H. J. Epema,et al.  Cost-driven scheduling of grid workflows using Partial Critical Paths , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[46]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[47]  Massoud Pedram,et al.  Temperature-aware dynamic resource provisioning in a power-optimized datacenter , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[48]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[49]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[50]  Jie Xu,et al.  An Approach for Characterizing Workloads in Google Cloud to Derive Realistic Resource Utilization Models , 2013, 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering.

[51]  Amin Vahdat,et al.  Dynamic Scheduling of Virtual Machines Running HPC Workloads in Scientific Grids , 2007, 2009 3rd International Conference on New Technologies, Mobility and Security.

[52]  Jinkyun Cho,et al.  Evaluation of air management system's thermal performance for superior cooling efficiency in high-de , 2011 .

[53]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[54]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[55]  Emmanuel Jeannot,et al.  On the distribution of sequential jobs in random brokering for heterogeneous computational grids , 2006, IEEE Transactions on Parallel and Distributed Systems.

[56]  Dimitri Kececioglu,et al.  Reliability engineering handbook , 1991 .

[57]  Li Lei,et al.  Study on Last-Level Cache Management Strategy of the Chip Multi-Processor , 2015 .

[58]  Jorge G. Barbosa,et al.  Last-Level Cache Interference-Aware Scheduling in Scientific Clouds , 2013 .

[59]  Erik Elmroth,et al.  A virtual machine re-packing approach to the horizontal vs. vertical elasticity trade-off for cloud autoscaling , 2013, CAC.

[60]  Jie Xu,et al.  An Analysis of Failure-Related Energy Waste in a Large-Scale Cloud Environment , 2014, IEEE Transactions on Emerging Topics in Computing.

[61]  Shin Gyu Kim,et al.  Virtual machine scheduling for multicores considering effects of shared on-chip last level cache interference , 2012, 2012 International Green Computing Conference (IGCC).

[62]  Frank Bellosa,et al.  Memory-aware Scheduling for Energy Efficiency on Multicore Processors , 2008, HotPower.

[63]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[64]  Xipeng Shen,et al.  Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? , 2010, CC.

[65]  Aamer Jaleel,et al.  Last level cache (LLC) performance of data mining workloads on a CMP - a case study of parallel bioinformatics workloads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[66]  Jie Xu,et al.  Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud , 2014, IEEE Transactions on Cloud Computing.

[67]  Felix Salfner,et al.  Timely Virtual Machine Migration for Pro-active Fault Tolerance , 2011, 2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops.

[68]  Calton Pu,et al.  An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[69]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[70]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[71]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[72]  Seyed Masoud Sadjadi,et al.  Paravirtualization for Scientific Computing: Performance Analysis and Prediction , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[73]  Martin L. Shooman,et al.  Reliability of Computer Systems and Networks: Fault Tolerance,Analysis,and Design , 2002 .

[74]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[75]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[76]  Ewa Deelman,et al.  The cost of doing science on the cloud: the Montage example , 2008, HiPC 2008.

[77]  Manish Parashar,et al.  Energy-efficient application-aware online provisioning for virtualized clouds and data centers , 2010, International Conference on Green Computing.

[78]  Xi He,et al.  Power-aware scheduling of virtual machines in DVFS-enabled clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[79]  Song Fu,et al.  Failure-aware resource management for high-availability computing clusters with distributed virtual machines , 2010, J. Parallel Distributed Comput..

[80]  Diwakar Krishnamurthy,et al.  A Model of Storage I/O Performance Interference in Virtualized Systems , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[81]  Sangyeun Cho,et al.  Characterizing Machines and Workloads on a Google Cluster , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[82]  Qian Zhu,et al.  A Performance Interference Model for Managing Consolidated Workloads in QoS-Aware Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[83]  Radu Prodan,et al.  A New Fault Tolerance Heuristic for Scientific Workflows in Highly Distributed Environments Based on Resubmission Impact , 2009, 2009 Fifth IEEE International Conference on e-Science.

[84]  Qian Zhu,et al.  Power-Aware Consolidation of Scientific Workflows in Virtualized Environments , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[85]  Vladimir Stantchev,et al.  Negotiating and Enforcing QoS and SLAs in Grid and Cloud Computing , 2009, GPC.

[86]  Liang Liu,et al.  Energy efficient scheduling of virtual machines in cloud with deadline constraint , 2015, Future Gener. Comput. Syst..

[87]  Tal Garfinkel,et al.  XvMotion: Unified Virtual Machine Migration over Long Distance , 2014, USENIX Annual Technical Conference.

[88]  Cheng-Zhong Xu,et al.  Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[89]  Jean-Louis Deneubourg,et al.  The dynamics of collective sorting robot-like ants and ant-like robots , 1991 .

[90]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.

[91]  Xianghua Xu,et al.  Performance Evaluation of the CPU Scheduler in XEN , 2008, 2008 International Symposium on Information Science and Engineering.

[92]  Marko Becker,et al.  Service Oriented Architecture Concepts Technology And Design , 2016 .

[93]  Said Mirza Pahlevi,et al.  Editorial: A Special Issue from the Open Grid Forum , 2009 .

[94]  Rizos Sakellariou,et al.  Adaptive resource configuration for Cloud infrastructure management , 2013, Future Gener. Comput. Syst..

[95]  Yudi Wei,et al.  QoS Guarantees and Service Differentiation for Dynamic Cloud Applications , 2013, IEEE Transactions on Network and Service Management.

[96]  Vincenzo Piuri,et al.  Fault Tolerance Management in Cloud Computing: A System-Level Perspective , 2013, IEEE Systems Journal.

[97]  Jing Xu,et al.  Multi-Objective Virtual Machine Placement in Virtualized Data Center Environments , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[98]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[99]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[100]  Heinz W. Schmidt,et al.  Architecture-based fault tolerance support for grid applications , 2011, QoSA-ISARCS '11.

[101]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[102]  Heinz W. Schmidt,et al.  Evaluating recovery aware components for grid reliability , 2009, ESEC/FSE '09.

[103]  Jing Xu,et al.  A multi-objective approach to virtual machine management in datacenters , 2011, ICAC '11.

[104]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[105]  Cody Bunch,et al.  OpenStack Cloud Computing Cookbook , 2012 .

[106]  K. Shin,et al.  HydraVM : Low-Cost , Transparent High Availability for Virt ual Machines , 2011 .

[107]  Bran Selic,et al.  A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.

[108]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[109]  K. Djemame,et al.  Towards Quality of Service in the Cloud , 2009 .

[110]  Dmytro Dyachuk,et al.  Maximizing Cloud Providers' Revenues via Energy Aware Allocation Policies , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[111]  Jorge G. Barbosa,et al.  Optimizing Energy-Efficiency in High-Available Scientific Cloud Environments , 2013, 2013 International Conference on Cloud and Green Computing.

[112]  Hossein Deldari,et al.  Improving cluster computing performance based on job futurity prediction , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[113]  Alba Cristina Magalhaes Alves de Melo,et al.  User-Defined Adaptive Fault-Tolerant Execution of Workflows in the Grid , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[114]  Patrick P. C. Lee,et al.  An experimental study of cascading performance interference in a virtualized environment , 2013, PERV.

[115]  Albert Y. Zomaya,et al.  Energy Efficient Distributed Computing Systems , 2012 .

[116]  Cheng-Zhong Xu,et al.  Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[117]  Jie Xu,et al.  Improved energy-efficiency in cloud datacenters with interference-aware virtual machine placement , 2013, 2013 IEEE Eleventh International Symposium on Autonomous Decentralized Systems (ISADS).

[118]  Erol Gelenbe Energy-Aware Routing in the Cognitive Packet Network , 2011 .

[119]  Sanjay Chaudhary,et al.  Performance isolation and scheduler behavior , 2010, 2010 First International Conference On Parallel, Distributed and Grid Computing (PDGC 2010).

[120]  Rajkumar Buyya,et al.  Energy-aware simulation with DVFS , 2013, Simul. Model. Pract. Theory.

[121]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[122]  Jorge G. Barbosa,et al.  Dynamic Power- and Failure-Aware Cloud Resources Allocation for Sets of Independent Tasks , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[123]  Heiko Ludwig,et al.  The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services , 2003, Journal of Network and Systems Management.

[124]  Hong Ong,et al.  System-Level Virtualization for High Performance Computing , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[125]  Kartik Gopalan,et al.  Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning , 2009, VEE '09.

[126]  Xing Pu,et al.  Performance Measurements and Analysis of Network I/O Applications in Virtualized Cloud , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[127]  Alexandru Iosup,et al.  Analysis and modeling of time-correlated failures in large-scale distributed systems , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[128]  Christoforos E. Kozyrakis,et al.  A Comparison of High-Level Full-System Power Models , 2008, HotPower.

[129]  Pankesh Patel,et al.  Service Level Agreement in Cloud Computing , 2009 .

[130]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[131]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[132]  Michela Meo,et al.  Probabilistic Consolidation of Virtual Machines in Self-Organizing Cloud Data Centers , 2013, IEEE Transactions on Cloud Computing.

[133]  Fabien Hermenier,et al.  Power Management in Grid Computing with Xen , 2006, ISPA Workshops.

[134]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[135]  Sergiu Nedevschi,et al.  Reducing Network Energy Consumption via Sleeping and Rate-Adaptation , 2008, NSDI.

[136]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[137]  Akshat Verma,et al.  Power-aware dynamic placement of HPC applications , 2008, ICS '08.

[138]  Tipp Moseley,et al.  Measuring interference between live datacenter applications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[139]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[140]  Erol Gelenbe,et al.  Energy-Efficient Cloud Computing , 2010, Comput. J..

[141]  Paul England,et al.  Feedback Driven QoS-Aware Power Budgeting for Virtualized Servers , 2009 .

[142]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[143]  Alain Delchambre,et al.  A genetic algorithm for bin packing and line balancing , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[144]  Eli M. Dow,et al.  Leveraging virtualization to optimize high-availability system configurations , 2008, IBM Syst. J..

[145]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[146]  Rajkumar Buyya,et al.  High-Performance Cloud Computing: A View of Scientific Applications , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[147]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter. , 1991 .

[148]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[149]  Xiaomin Zhu,et al.  QoS-Aware Fault-Tolerant Scheduling for Real-Time Tasks on Heterogeneous Clusters , 2011, IEEE Transactions on Computers.

[150]  Yunlan Wang,et al.  Virtual High Performance Computing Environments for Science Computing On-Demand , 2011, 2011 Sixth Annual Chinagrid Conference.

[151]  Saurabh Kumar,et al.  Energy Efficient Utilization of Resources in Cloud Computing Systems , 2016 .

[152]  Alexandru Stan,et al.  Autonomous Management of Virtual Machine Failures in IaaS Using Fault Tree Analysis , 2014, GECON.

[153]  Rajkumar Buyya,et al.  Energy and Carbon-Efficient Placement of Virtual Machines in Distributed Cloud Data Centers , 2013, Euro-Par.