Recovering from Cloud Application Deployment Failures Through Re-execution

In this paper we study the problem of automated cloud application deployment and configuration. Transient failures are commonly found in current cloud infrastructures, attributed to the complexity of the software and hardware stacks utilized. These errors affect cloud application deployment, forcing the users to manually check and intervene in the deployment process. To address this challenge, we propose a simple yet powerful deployment methodology with error recovery features that bases its functionality on identifying the script dependencies and re-executing the appropriate configuration scripts. To guarantee the idempotent script execution, we adopt a filesystem snapshot mechanism that enables our approach to revert to a healthy filesystem state in case of failed script executions. Our experimental analysis indicates that our approach can resolve any transient deployment failure appearing during the deployment phase, even in highly unpredictable cloud environments.

[1]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[2]  Rahul Potharaju,et al.  When the network crumbles: an empirical study of cloud network failures and their impact on services , 2013, SoCC.

[3]  Yasuharu Katsuno,et al.  An Automated Parallel Approach for Rapid Deployment of Composite Application Servers , 2015, 2015 IEEE International Conference on Cloud Engineering.

[4]  Ewa Deelman,et al.  Automating Application Deployment in Infrastructure Clouds , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[5]  Ioannis Konstantinou,et al.  Automated, Elastic Resource Provisioning for NoSQL Clusters Using TIRAMOLA , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[6]  Rolf Stadler,et al.  Resource Management in Clouds: Survey and Research Challenges , 2015, Journal of Network and Systems Management.

[7]  Jacobus E. van der Merwe,et al.  Cloud Resource Orchestration: A Data-Centric Approach , 2011, CIDR.

[8]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[9]  Liming Zhu,et al.  Mechanisms and Architectures for Tail-Tolerant System Operations in Cloud , 2014, HotCloud.