Risk-Driven Proactive Fault-Tolerant Operation of IaaS Providers

In order to improve service execution in Clouds, the management of Cloud Infrastructure has to take measures to adhere to Service Level Agreements and Business Level Objectives, from the application layer through to how services are supported at the lowest hardware levels. In this paper a risk model methodology and holistic management approach is developed specific to the operation of the Cloud Infrastructure Provider and is applied through improvements to SLA fault tolerance in Cloud Infrastructure. Risk assessments are used to analyse execution specific data from the Cloud Infrastructure and linked to a business driven holistic management component that is part of a Cloud Manager. Initial results show improved eco-efficiency, virtual machine availability and reductions in SLA failure across the whole Cloud infrastructure by applying our combined risk-based fault tolerance approach.

[1]  Burton S. Kaliski,et al.  Toward Risk Assessment as a Service in Cloud Environments , 2010, HotCloud.

[2]  Christopher D. Carothers,et al.  An analysis of clustered failures on large supercomputing systems , 2009, J. Parallel Distributed Comput..

[3]  Siani Pearson,et al.  Toward Accountability in the Cloud , 2011, IEEE Internet Computing.

[4]  Christian Engelmann,et al.  Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[5]  Jordi Torres,et al.  Optimal Resource Allocation in a Virtualized Software Aging Platform with Software Rejuvenation , 2011, 2011 IEEE 22nd International Symposium on Software Reliability Engineering.

[6]  Mario Macías,et al.  Toward business-driven risk management for Cloud computing , 2010, 2010 International Conference on Network and Service Management.

[7]  Song Fu Failure-Aware Construction and Reconfiguration of Distributed Virtual Machines for High Availability Computing , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Leslie P. Willcocks,et al.  Risk assessment and information systems , 1993, ECIS.

[9]  Karim Djemame,et al.  Brokering of risk‐aware service level agreements in grids , 2011, Concurr. Comput. Pract. Exp..

[10]  Prashant Srivastava,et al.  An architecture based on proactive model for security in cloud computing , 2011, 2011 International Conference on Recent Trends in Information Technology (ICRTIT).

[11]  Calton Pu,et al.  Performance and availability aware regeneration for cloud based multitier applications , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[12]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[13]  Chen Chen,et al.  Specify and enforce the policies of quantified risk adaptive access control , 2010, The 2010 14th International Conference on Computer Supported Cooperative Work in Design.

[14]  Karim Djemame,et al.  A Risk Assessment Framework and Software Toolkit for Cloud Service Ecosystems , 2011, CLOUD 2011.

[15]  Roel Wieringa,et al.  Risk-based Confidentiality Requirements Specification for Outsourced IT Systems , 2010, 2010 18th IEEE International Requirements Engineering Conference.

[16]  Odej Kao,et al.  Introducing Risk Management into the Grid , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[17]  Benoit Hudzia,et al.  Future Generation Computer Systems Optimis: a Holistic Approach to Cloud Service Provisioning , 2022 .