Performance evaluation of rollback-recovery techniques in computer programs

Rollback recovery is a backward error recovery technique used to recover from temporary faults in database and process control systems. Rollback in process control systems is generally constrained by deadlines, thereby requiring a dynamic insertion of rollback points. This is in contrast to rollback recovery in database systems in which rollback points are inserted at equidistant intervals. A simple model based on a semi-Markov process is developed to study the performance of rollback recovery strategies. Using this model, the mean program completion time is obtained for both database and process control systems when rollback recovery is implemented. The analytic results obtained by the semi-Markov model are compared with the simulation results by means of extensive computer simulations. >

[1]  D.P. Siewiorek,et al.  Testing of digital systems , 1981, Proceedings of the IEEE.

[2]  Erol Gelenbe,et al.  Performance of rollback recovery systems under intermittent failures , 1978, CACM.

[3]  Kewal K. Saluja,et al.  An experimental study to determine task size for rollback recovery systems , 1988 .

[4]  Parag K. Lala,et al.  Fault tolerant and fault testable hardware design , 1985 .

[5]  K. Mani Chandy,et al.  A Survey of Analytic Models of Rollback and Recovery Stratergies , 1975, Computer.

[6]  M. H. MacDougall Simulating computer systems , 1987 .

[7]  Erol Gelenbe,et al.  On the Optimum Checkpoint Interval , 1979, JACM.

[8]  R. H. Campbell,et al.  A fault-tolerant scheduling problem , 1989, IEEE Transactions on Software Engineering.

[9]  Kang G. Shin,et al.  Optimal Checkpointing of Real-Time Tasks , 1987, IEEE Transactions on Computers.

[10]  K. Mani Chandy,et al.  Analytic models for rollback and recovery strategies in data base systems , 1975, IEEE Transactions on Software Engineering.

[11]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[12]  Shambhu J. Upadhyaya Rollback recovery in real-time systems with dynamic constraints , 1990, Proceedings., Fourteenth Annual International Computer Software and Applications Conference.

[13]  Andrzej Duda,et al.  The Effects of Checkpointing on Program Execution Time , 1983, Inf. Process. Lett..

[14]  Kang G. Shin,et al.  On Scheduling Tasks with a Quick Recovery from Failure , 1986, IEEE Transactions on Computers.

[15]  Richard D. Schlichting,et al.  A Technique for Estimating Performance of Fault-Tolerant Programs , 1985, IEEE Transactions on Software Engineering.

[16]  Kang G. Shin,et al.  Evaluation of Error Recovery Blocks Used for Cooperating Processes , 1984, IEEE Transactions on Software Engineering.

[17]  Kewal K. Saluja,et al.  A watchdog processor based general rollback technique with multiple retries , 1986, IEEE Transactions on Software Engineering.

[18]  C. V. Ramamoorthy,et al.  Rollback and Recovery Strategies for Computer Programs , 1972, IEEE Transactions on Computers.