System management to comply with SLA availability guarantees in cloud computing

SLAs are common means to define specifications and requirements of cloud computing services in business relationships. The terms that define the guaranteed availability for a given period are fundamental to these contracts. In this context, a natural question for cloud providers is: How to guarantee the availability promised? This paper studies the level of availability offered to a virtual machine during an SLA period in clouds with different: size, redundancy, and fault tolerance techniques. Finally, this paper proposes the use of the SLA -budget for the implementation of smart policies in: i) the assignment of spare servers when virtual machines are restored. ii) the dynamic use of different fault tolerance licenses. Using such policies results in a considerable reduction of the probability of breaching the SLA guarantee, by making an efficient use of the cloud resources available. This paper is a first step in the design of SLA-aware cloud architectures.

[1]  Enrico Schiattarella Introduction to Storage Area Networks , 2002 .

[2]  Bjarne E. Helvik,et al.  Adaptive management of connections to meet availability guarantees in SLAs , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management.

[3]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[4]  W. H I T E P A P,et al.  Protecting Mission-Critical Workloads with VMware Fault Tolerance , 2009 .

[5]  Borja Sotomayor,et al.  Virtual Infrastructure Management in Private and Hybrid Clouds , 2009, IEEE Internet Computing.

[6]  Orran Krieger,et al.  Virtualization for high-performance computing , 2006, OPSR.

[7]  Anna Bernasconi,et al.  Introduction to Storage Area Networks , 2003 .

[8]  Gerald J. Popek,et al.  Formal requirements for virtualizable third generation architectures , 1974, SOSP '73.

[9]  Hai Jin,et al.  Live migration of virtual machine based on full system trace and replay , 2009, HPDC '09.

[10]  Graham M. Birtwistle,et al.  DEMOS A System for Discrete Event Modelling on Simula , 1979, Springer New York.

[11]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[12]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[13]  Pirkko Kuusela,et al.  Analysis of Dependencies between Failures in the UNINETT IP Backbone Network , 2010, 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing.

[14]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[15]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[16]  A. Goyal,et al.  A Measure of Guaranteed Availability and its Numerical Evaluation , 1988, IEEE Trans. Computers.

[17]  Andrés J González,et al.  Guaranteeing Service Availability in SLAs; a Study of the Risk Associated with Contract Period and Failure Process , 2010, IEEE Latin America Transactions.

[18]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[19]  Graham Birtwistle A system for discrete event modelling on SIMULA , 1979 .

[20]  Petr Kuznetsov,et al.  Zeno: Eventually Consistent Byzantine-Fault Tolerance , 2009, NSDI.

[21]  R. P. Goldberg,et al.  Virtual Machine Technology: A Bridge From Large Mainframes To Networks Of Small Computers , 1979 .