An Integrated Virtualized Strategy for Fault Tolerance in Cloud Computing Environment

Cloud fault tolerance allows the cloud an ability to keep performing its functions correctly even if faults are occurring in the system. This becomes an important property that enables a complete system to continue functioning in the event of one or more faults for high availability of virtual machines or in life critical systems. A fault-tolerant design may allow the system to possibly function at a reduced level, rather than failing completely. As a major concern in guaranteeing availability, reliability of critical services or application execution in cloud environment, cloud fault tolerance research focuses on detection, recovery strategies. However, in order to minimize impacts, anticipate failures to proactively handle them, a model called an Integrated Virtualized Failover strategy (IVFS) was introduced where fault tolerance was realized using redundancy, checkpoint/replay, fault manager. In this paper, we critically analyze this model, proposed a model that tolerate faults based on the reliability of each computing node or virtual machine, removing these from the availability list if the performance is not optimal. The results of our algorithm presents an increase in pass rates, considers forward/backward recovery using diverse software tools. Our simulation results suggest a good performance compared to current existing models. The results are demonstrated through experimental validation with a critical analysis, laying the foundation for a fully fault tolerant IaaS Cloud environment.

[1]  Yaser Jararweh,et al.  TeachCloud: a cloud computing educational toolkit , 2013, Int. J. Cloud Comput..

[2]  J. Singh,et al.  High Availability of Clouds: Failover Strategies for Cloud Computing Using Integrated Checkpointing Algorithms , 2012, 2012 International Conference on Communication Systems and Network Technologies.

[3]  Christian Engelmann,et al.  Combining Partial Redundancy and Checkpointing for HPC , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[4]  Amal Ganesh,et al.  A study on fault tolerance methods in Cloud Computing , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[5]  Laurent Broto,et al.  Approaches to cloud computing fault tolerance , 2012, 2012 International Conference on Computer, Information and Telecommunication Systems (CITS).

[6]  Pabitra Mohan Khilar,et al.  VFT: A virtualization and fault tolerance approach for cloud computing , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[7]  Alexandru Iosup,et al.  An Availability-on-Demand Mechanism for Datacenters , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[8]  Bran Selic,et al.  A Fault Tolerance Framework for High Performance Computing in Cloud , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[9]  Thakur Kapil Singh,et al.  Fault Tolerance- Challenges, Techniques and Implementation in Cloud Computing , 2013 .

[10]  Ekpe Okorafor,et al.  A Fault-Tolerant High Performance Cloud Strategy for Scientific Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[11]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[12]  Mariam Kiran,et al.  Analysis of Cloud Test Beds Using OpenSource Solutions , 2015, 2015 3rd International Conference on Future Internet of Things and Cloud.

[13]  Filip De Turck,et al.  Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids , 2009, IEEE Transactions on Parallel and Distributed Systems.

[14]  S. Siva Sathya,et al.  Survey of fault tolerant techniques for grid , 2010, Comput. Sci. Rev..

[15]  Chao-Tung Yang,et al.  Implementation of a Cloud IaaS with Dynamic Resource Allocation Method Using OpenStack , 2013, 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[16]  Albert Y. Zomaya,et al.  Fault Tolerance in the Cloud , 2016 .

[17]  Calton Pu,et al.  Performance and availability aware regeneration for cloud based multitier applications , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[18]  L. Breuer Introduction to Stochastic Processes , 2022, Statistical Methods for Climate Scientists.

[19]  Mohsine Eleuldj,et al.  Cloud computing migration and IT resources rationalization , 2014, 2014 International Conference on Multimedia Computing and Systems (ICMCS).

[20]  Bran Selic,et al.  A Proactive Fault Tolerance Approach to High Performance Computing (HPC) in the Cloud , 2012, 2012 Second International Conference on Cloud and Green Computing.

[21]  Rajkumar Buyya,et al.  CloudAnalyst: A CloudSim-Based Visual Modeller for Analysing Cloud Computing Environments and Applications , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[22]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[23]  Mohammed Amoon,et al.  A job checkpointing system for computational grids , 2013, Central European Journal of Computer Science.

[24]  Chuang Lin,et al.  Performance, Fault-Tolerance and Scalability Analysis of Virtual Infrastructure Management System , 2009, 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[25]  Heon Young Yeom,et al.  Node selection for a fault-tolerant streaming service on a peer-to-peer network , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[26]  Raymond H. Putra,et al.  Dependable virtual machine allocation , 2013, 2013 Proceedings IEEE INFOCOM.

[27]  Elliot K. Kolodner,et al.  Guaranteeing High Availability Goals for Virtual Machine Placement , 2011, 2011 31st International Conference on Distributed Computing Systems.

[28]  V. Piuri,et al.  Fault tolerance management in IaaS clouds , 2012, 2012 IEEE First AESS European Conference on Satellite Telecommunications (ESTEL).

[29]  Jasbir Kaur,et al.  Efficient Algorithm for Fault Tolerance in Cloud Computing , 2014 .

[30]  Guiran Chang,et al.  Analyzing, modeling and evaluating dynamic adaptive fault tolerance strategies in cloud computing environments , 2013, The Journal of Supercomputing.

[31]  I B Shubinsky,et al.  Adaptive Fault Tolerance in Real-Time Information Systems , 2017 .

[32]  Rodrigo Nogueira,et al.  CloudBFT: Elastic Byzantine Fault Tolerance , 2014, 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing.

[33]  Gaurav Raj,et al.  A novel high adaptive fault tolerance model in real time cloud computing , 2014, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence).

[34]  Stephen L. Scott,et al.  An optimal checkpoint/restart model for a large scale high performance computing system , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[35]  Danny Raz,et al.  Cost aware fault recovery in clouds , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[36]  K. Jairam Naik,et al.  A novel fault-tolerant task scheduling algorithm for computational grids , 2013, 2013 15th International Conference on Advanced Computing Technologies (ICACT).

[37]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.