Optimization of Full versus Incremental Periodic Backup Policy

This paper models repairable computing systems performing a mission that is successful if the system can accomplish a specified amount of work within the allowed mission time or deadline. During the mission the system is subject to a sequence of full and incremental data backup procedures to facilitate an effective system recovery and avoid repeating the entire mission work from the very beginning when a system failure happens. The repair time is fixed while the system time-to-failure can follow any arbitrary type of distributions. This paper makes novel contributions by first developing a new numerical algorithm to evaluate mission success probability and expected completion time of the considered repairable real-time computing systems subject to mixed full and incremental backups. Correctness of the proposed evaluation algorithm is verified using Monte Carlo simulations. We make another new contribution by formulating and solving the backup schedule optimization problem that finds the full and incremental backup frequencies maximizing the mission success probability. Through illustrative examples, effects of different parameters (including the system time-to-failure distribution parameter, maximum allowed mission time, data backup and retrieval times, storage availability, repair time and efficiency) on the mission success probability and expected completion time as well as on the optimal backup schedule solution are investigated.

[1]  Yuan Lin Zhang,et al.  A deteriorating cold standby repairable system with priority in use , 2007, Eur. J. Oper. Res..

[2]  Gregory Levitin,et al.  Mission Cost and Reliability of 1-out-of- $N$ Warm Standby Systems With Imperfect Switching Mechanisms , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  M. Kijima SOME RESULTS FOR REPAIRABLE SYSTEMS WITH GENERAL REPAIR , 1989 .

[4]  Yang Tang,et al.  Secure Overlay Cloud Storage with Access Control and Assured Deletion , 2012, IEEE Transactions on Dependable and Secure Computing.

[5]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[6]  Russell J. Green,et al.  Designing a Fast On-line Backup System for a Log-structured File System , 1996, Digit. Tech. J..

[7]  Sophie Mercier A preventive maintenance policy with sequential checking procedure for a Markov deteriorating system , 2002, Eur. J. Oper. Res..

[8]  Bo Henry Lindqvist,et al.  Statistical Modeling and Analysis of Repairable Systems , 2006, 0708.0362.

[9]  Gregory Levitin,et al.  Effect of Failure Propagation on Cold vs. Hot Standby Tradeoff in Heterogeneous 1-Out-of-$N$:G Systems , 2015, IEEE Transactions on Reliability.

[10]  Tony Rosqvist Bayesian aggregation of experts' judgements on failure intensity , 2000, Reliab. Eng. Syst. Saf..

[11]  Kedar S. Namjoshi,et al.  The inherent difficulty of timely primary-backup replication , 2011, PODC '11.

[12]  Shey-Huei Sheu,et al.  A Bayesian approach to an adaptive preventive maintenance model , 2001, Reliab. Eng. Syst. Saf..

[13]  Sophie Mercier Optimal restarting distribution after repair for a Markov deteriorating system , 2001, Reliab. Eng. Syst. Saf..

[14]  Shaomin Wu,et al.  A replacement policy for a repairable system with its repairman having multiple vacations , 2009, Comput. Ind. Eng..

[15]  Mohammad Modarres,et al.  Reliability Engineering and Risk Analysis: A Practical Guide, Second Edition , 2009 .

[16]  L. Yeh A note on the optimal replacement problem , 1988, Advances in Applied Probability.

[17]  Ruey Huei Yeh,et al.  Optimal periodic replacement policy for repairable products under free-repair warranty , 2007, Eur. J. Oper. Res..

[18]  Dragan Banjevic,et al.  Periodic Inspection Optimization Models for a Repairable System Subject to Hidden Failures , 2011, IEEE Transactions on Reliability.

[19]  Kishor S. Trivedi,et al.  Performance and Availability Modeling of ITSystems with Data Backup and Restore , 2014, IEEE Transactions on Dependable and Secure Computing.

[20]  Kartikeya S. Puranam,et al.  A note on the optimal replacement problem , 2006 .

[21]  Yun Zhou,et al.  The Reliability Wall for Exascale Supercomputing , 2012, IEEE Transactions on Computers.

[22]  Hong Jiang,et al.  Application-Aware Local-Global Source Deduplication for Cloud Backup Services of Personal Storage , 2014, IEEE Transactions on Parallel and Distributed Systems.

[23]  Gregory Levitin,et al.  Heterogeneous 1-Out-of-N Warm Standby Systems With Dynamic Uneven Backups , 2015, IEEE Transactions on Reliability.

[24]  Inmaculada Torres Castro,et al.  Reward optimization of a repairable system , 2006, Reliab. Eng. Syst. Saf..

[25]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[26]  W. Weibull A Statistical Distribution Function of Wide Applicability , 1951 .

[27]  Kishor S. Trivedi,et al.  Availability Modeling and Analysis for Data Backup and Restore Operations , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[28]  Dragan Banjevic,et al.  Optimum inspection interval for a system under periodic and opportunistic inspections , 2012 .

[29]  Hamid Reza Golmakani,et al.  Optimal nonperiodic inspection scheme for a multicomponent repairable system with failure interaction using A* search algorithm , 2013 .

[30]  James da Silva,et al.  The Amanda Network Backkup Manager , 1993, LISA.

[31]  Gregory M Papadopoulos,et al.  Fault Tolerant Hardware/Software Architecture for Flight Critical Function , 1985 .

[32]  Hamid Reza Golmakani,et al.  Optimal non-periodic inspection scheme for a multi-component repairable system using A∗ search algorithm , 2012, Comput. Ind. Eng..

[33]  Gary R. Weckman,et al.  Modeling the reliability of repairable systems in the aviation industry , 2001 .

[34]  Kimberly Keeton,et al.  A framework for evaluating storage system dependability , 2004, International Conference on Dependable Systems and Networks, 2004.

[35]  Stephen B. Johnson,et al.  System Health Management: With Aerospace Applications , 2011 .

[36]  William H. Sanders,et al.  Designing dependable storage solutions for shared application environments , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[37]  Mohammad Modarres,et al.  Generalized renewal process for analysis of repairable systems with limited failure experience , 2002, Reliab. Eng. Syst. Saf..

[38]  Hamid Reza Golmakani,et al.  Periodic inspection optimization model for a two-component repairable system with failure interaction , 2012, Comput. Ind. Eng..

[39]  David F. Percy,et al.  Bayesian enhanced strategic decision making for reliability , 2002, Eur. J. Oper. Res..

[40]  Liudong Xing,et al.  Mission Reliability, Cost and Time for Cold Standby Computing Systems with Periodic Backup , 2015, IEEE Transactions on Computers.

[41]  Ruey Huei Yeh,et al.  Optimal preventive-maintenance warranty policy for repairable products , 2001, Eur. J. Oper. Res..

[42]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[43]  Sophie Bloch-Mercier,et al.  Stochastics and Statistics A preventive maintenance policy with sequential checking procedure for a Markov deteriorating system , 2002 .

[44]  Amit Monga,et al.  Optimal system design considering maintenance and warranty , 1998, Comput. Oper. Res..

[45]  Jian Xu,et al.  An optimal replacement policy for a repairable system based on its repairman having vacations , 2011, Reliab. Eng. Syst. Saf..

[46]  Dragan Banjevic,et al.  Optimal inspection of a complex system subject to periodic and opportunistic inspections and preventive replacements , 2012, Eur. J. Oper. Res..

[47]  A. Chervenak,et al.  Protecting File Systems : A Survey of Backup Techniques , 1998 .

[48]  Shaomin Wu,et al.  Optimizing replacement policy for a cold-standby system with waiting repair times , 2009, Appl. Math. Comput..

[49]  Dragan Banjevic,et al.  Periodic inspection optimization model for a complex repairable system , 2010, Reliab. Eng. Syst. Saf..

[50]  Daoud Aït-Kadi,et al.  Performance evaluation of multi-state degraded systems with minimal repairs and imperfect preventive maintenance , 2010, Reliab. Eng. Syst. Saf..

[51]  David A. Pease,et al.  Beyond backup toward storage management , 2003, IBM Syst. J..

[52]  Yili Hong,et al.  Reliability Analysis of Repairable Systems With Dependent Component Failures Under Partially Perfect Repair , 2013, IEEE Transactions on Reliability.

[53]  Antonio Sánchez Heguedas,et al.  Models for maintenance optimization: a study for repairable systems and finite time periods , 2002, Reliab. Eng. Syst. Saf..

[54]  P. F. Frutuoso e Melo,et al.  An application of non-homogeneous Poisson point processes to the reliability analysis of service water pumps , 2001 .