A Migration Approach for Fault Tolerance in Cloud Computing

Cloud computing has become a significant technology and a great solution for providing a flexible, on-demand, and dynamically scalable computing infrastructure for many applications. Cloud computing also presents a significant technology trends. With the cloud computing technology, users use a variety of devices to access programs, storage, and application-development platforms over the Internet, via services offered by cloud computing providers. The probability of failure occur during the execution becomes stronger when the number of node increases; since it is impossible to fully prevent failures, one solution is to implement fault tolerance mechanisms. Fault tolerance has become a major task for computer engineers and software developers because the occurrence of faults increases the cost of using resources. In this paper, the authors have proposed an approach that is a combination of migration and checkpoint mechanism. The checkpoint mechanism minimizes the time lost and reduces the effect of failures on application execution while the migration mechanism guarantee the continuity of application execution and avoid any loss due to hardware failure in a way transparent and efficient. The results obtained by the simulation show the effectiveness of our approaches to fault tolerance in term of execution time and masking effects of failures.

[1]  Louise E. Moser,et al.  Fault Tolerance Middleware for Cloud Computing , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[2]  Qin Zheng Improving MapReduce fault tolerance in the cloud , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[3]  Rajkumar Buyya,et al.  NetworkCloudSim: Modelling Parallel Applications in Cloud Simulations , 2011, 2011 Fourth IEEE International Conference on Utility and Cloud Computing.

[4]  Zibin Zheng,et al.  FTCloud: A Component Ranking Framework for Fault-Tolerant Cloud Applications , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[5]  Jing Deng,et al.  Fault-tolerant and reliable computation in cloud computing , 2010, 2010 IEEE Globecom Workshops.

[6]  Jordi Torres,et al.  Checkpoint-based fault-tolerant infrastructure for virtualized service providers , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[7]  Abbas Vafaei,et al.  Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems , 2012, Int. J. Grid High Perform. Comput..

[8]  Eric Bauer,et al.  Reliability and Availability of Cloud Computing: Bauer/Cloud Computing , 2012 .

[9]  Rajkumar Buyya,et al.  Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities , 2009, 2009 International Conference on High Performance Computing & Simulation.

[10]  Dejan S. Milojicic,et al.  OpenNebula: A Cloud Management Tool , 2011, IEEE Internet Computing.

[11]  Barrie Sosinsky,et al.  Cloud Computing Bible , 2010 .

[12]  Eric Bauer,et al.  Reliability and Availability of Cloud Computing , 2012 .

[13]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[14]  Vaidy S. Sunderam,et al.  Unibus: Aspects of heterogeneity and fault tolerance in cloud computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[15]  Wei Jie,et al.  Cloud Computing Security: Opportunities and Pitfalls , 2012, Int. J. Grid High Perform. Comput..

[16]  Shadi Aljawarneh,et al.  Cloud Security Engineering: Avoiding Security Threats the Right Way , 2011, Int. J. Cloud Appl. Comput..

[17]  Ghalem Belalem,et al.  Collaborative Services for Fault Tolerance in Hierarchical Data Grid , 2014, Int. J. Distributed Syst. Technol..