High Availability of Clouds: Failover Strategies for Cloud Computing Using Integrated Checkpointing Algorithms

This paper presents an approach for providing high availability to the requests of cloud's clients. To achieve this objective, fail over strategies for cloud computing using integrated check pointing algorithms are purposed in this paper. Purposed strategy integrate check pointing feature with load balancing algorithms and also make multilevel checkpoint to decrease check pointing overheads. For implementation of purposed fail over strategies, a cloud simulation environment is developed, which has the ability to provide high availability to clients in case of failure/recovery of service nodes. Also in this paper comparison of developed simulator is made with existing methods. The purposed fail over strategy will work on application layer and provide highly availability for Platform as a Service (PaaS) feature of cloud computing.

[1]  James S. Plank,et al.  Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems , 2001, J. Parallel Distributed Comput..

[2]  Joseph D. Sloan,et al.  High performance Linux clusters - with OSCAR, Rocks, openMosix, and MPI , 2004 .

[3]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[4]  Andrzej Duda,et al.  The Effects of Checkpointing on Program Execution Time , 1983, Inf. Process. Lett..

[5]  Stephen L. Scott,et al.  Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[6]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[7]  Sheng-De Wang,et al.  Minimizing Migration on Grid Environments: an Experience on Sun Grid Engine , 2007 .

[8]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[9]  Katarina Stanoevska-Slabeva,et al.  Grid and Cloud Computing, A Business Perspective on Technology and Applications , 2009, Grid and Cloud Computing.

[10]  Larry Rudolph,et al.  Cooperative checkpointing: a robust approach to large-scale systems reliability , 2006, ICS '06.

[11]  V. Rajaraman,et al.  A survey of checkpointing algorithms for parallel and distributed computers , 2000 .

[12]  George Reese,et al.  Cloud Application Architectures - Building Applications and Infrastructure in the Cloud , 2009 .