Differentiated Availability in Cloud Computing SLAs

Cloud computing is the new trend in service delivery, and promises large cost savings and agility for the customers. However, some challenges still remain to be solved before widespread use can be seen. This is especially relevant for enterprises, which currently lack the necessary assurance for moving their critical data and applications to the cloud. The cloud SLAs are simply not good enough. This paper focuses on the availability attribute of a cloud SLA, and develops a complete model for cloud data centers, including the network. Different techniques for increasing the availability in a virtualized system are investigated, quantifying the resulting availability. The results show that depending on the failure rates, different deployment scenarios and fault-tolerance techniques can be used for achieving availability differentiation. However, large differences can be seen from using different priority levels for restarting of virtual machines.

[1]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[2]  W. H I T E P A P,et al.  Protecting Mission-Critical Workloads with VMware Fault Tolerance , 2009 .

[3]  P. Mell,et al.  SP 800-145. The NIST Definition of Cloud Computing , 2011 .

[4]  Kenneth van Surksum Paper: VMware High Availability: Deployment Best Practices , 2010 .

[5]  John H. Seader,et al.  Tier Classifications Define Site Infrastructure Performance , 2006 .

[6]  Jin B. Hong,et al.  Availability Modeling and Analysis of a Virtualized System , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.

[7]  Deep Medhi,et al.  A hierarchical model to evaluate quality of experience of online services hosted by cloud computing , 2011, 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops.

[8]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[9]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[10]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[11]  Zhoujun Li,et al.  Adaptive Management of Virtualized Resources in Cloud Computing Using Feedback Control , 2009, 2009 First International Conference on Information Science and Engineering.

[12]  Michael Dahlin,et al.  End-to-end WAN service availability , 2001, TNET.

[13]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[14]  Schahram Dustdar,et al.  LAYSI: A Layered Approach for SLA-Violation Propagation in Self-Manageable Cloud Infrastructures , 2010, 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops.

[15]  Jeffrey Dean,et al.  Designs, Lessons and Advice from Building Large Distributed Systems , 2009 .

[16]  Daniel A. Menascé Performance and availability of Internet data centers , 2004, IEEE Internet Computing.

[17]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.