Modeling and evaluation of mixed redundancy strategy with instant switching in cloud-based systems

Mixed redundancy strategy is generally used in cloud-based systems, with different node switch mechanism from traditional mixed strategy. However, related researches often concentrates on traditional mixed redundancy strategy in which cold standby components is working only after all active nodes fail. So a model is developed to evaluate the reliability and performance of cloud-based degraded system subjected to mixed active and cold standby redundancy strategy with continual monitoring and detection mechanism. It is assumed that the node switching process is triggered once some active nodes fail and there are available standby nodes. A continuous-time Markov chain is built on top of the state transition process and both transient and steady state availability and expected job completion rate are used to evaluate system metrics with or with repair facilities. A numerical method is used to solve the model and sensitivity analysis is conducted on different redundancy strategy. Illustrative examples using real-world data were presented to explain the process of calculating the probability of each state and the different kinds of availability and performance. The comparison with traditional mixed redundancy strategy proved that the system behavior was different using different kinds of mixed strategy and the analysis model for traditional strategy was not suitable for strategies in cloud-bases system.

[1]  Limin Xiao,et al.  Mvmotion: a metadata based virtual machine migration in cloud , 2013, Cluster Computing.

[2]  S TrivediKishor,et al.  Job completion time on a virtualized server with software rejuvenation , 2014 .

[3]  Kishor S. Trivedi,et al.  Numerical transient analysis of markov models , 1988, Comput. Oper. Res..

[4]  Daoud Aït-Kadi,et al.  Performance evaluation of multi-state degraded systems with minimal repairs and imperfect preventive maintenance , 2010, Reliab. Eng. Syst. Saf..

[5]  Inderveer Chana,et al.  Intelligent failure prediction models for scientific workflows , 2015, Expert Syst. Appl..

[6]  Zibin Zheng,et al.  Reliability-Based Design Optimization for Cloud Migration , 2014, IEEE Transactions on Services Computing.

[7]  Ali Zeinal Hamadani,et al.  Reliability optimization of series-parallel systems with mixed redundancy strategy in subsystems , 2014, Reliab. Eng. Syst. Saf..

[8]  David W. Coit,et al.  Maximization of System Reliability with a Choice of Redundancy Strategies , 2003 .

[9]  Ju Zhang,et al.  An Animation Video Resource Conversion System Based on Supercomputers , 2014 .

[10]  Liudong Xing,et al.  Mission Reliability, Cost and Time for Cold Standby Computing Systems with Periodic Backup , 2015, IEEE Transactions on Computers.

[11]  Hong-Zhong Huang,et al.  A Joint Redundancy and Imperfect Maintenance Strategy Optimization for Multi-State Systems , 2013, IEEE Transactions on Reliability.

[12]  Bran Selic,et al.  A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.

[13]  Jin Zhang,et al.  Research about mobile AR system based on cloud computing , 2013, 2013 22nd Wireless and Optical Communication Conference.

[14]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[15]  You Wu,et al.  Implementation of SVD Parallel Algorithm and its Application in Medical Industry , 2015 .

[16]  Jameela Al-Jaroodi,et al.  MidCloud: an agent‐based middleware for effective utilization of replicated Cloud services , 2015, Softw. Pract. Exp..

[17]  Ching-Hsien Hsu,et al.  On Improvement of Cloud Virtual Machine Availability with Virtualization Fault Tolerance Mechanism , 2011, CloudCom.

[18]  Taieb Znati,et al.  Shadow Replication: An Energy-Aware, Fault-Tolerant Computational Model for Green Cloud Computing , 2014 .

[19]  D. Coit Cold-standby redundancy optimization for nonrepairable systems , 2001 .

[20]  Roberto R. Osorio,et al.  Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes , 2013, New Generation Computing.

[21]  Ying Li,et al.  Guaranteeing Fault-Tolerant Requirement Load Balancing Scheme Based on VM Migration , 2014, Comput. J..

[22]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.