Adaptive Checkpointing Schemes for Fault Tolerance in Real-Time Systems with Task Duplication

Dynamic adaptation techniques based on checkpointing is studied in this paper. Placing store-checkpoints and compare-checkpoints between CSCP (store-and-compare-checkpoint), we first present adaptive checkpointing schemes in which the checkpointing interval for a task is dynamically adjusted on line. Introducing the overheads of comparison and storage, the average execution times to complete a task for proposed schemes are obtained, using renewal equations. Further, we have dis- cussed analytically the optimal numbers of checkpoints that minimize the average execution times. We then extend proposed schemes to a set of multiple tasks in real-time systems. Simulation results show that compared to previous method, the proposed approach significantly increases the likelihood of timely task completion.

[1]  Shunji Osaki,et al.  Applied stochastic system modeling , 1985 .

[2]  Andrzej Duda,et al.  The Effects of Checkpointing on Program Execution Time , 1983, Inf. Process. Lett..

[3]  Nitin H. Vaidya,et al.  A case for two-level distributed recovery schemes , 1995, SIGMETRICS '95/PERFORMANCE '95.

[4]  Naohiro Ishii,et al.  Optimal checkpointing interval of a communication system with rollback recovery , 2003 .

[5]  Hong Chen,et al.  Performance Optimization of Checkpointing Schemes with Task Duplication , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[6]  Nestor Thome Proceedings of the 8th WSEAS International Conference on Applied Mathematics , 2005 .

[7]  Sang Lyul Min,et al.  Worst case timing requirement of real-time tasks with time redundancy , 1999, Proceedings Sixth International Conference on Real-Time Computing Systems and Applications. RTCSA'99 (Cat. No.PR00306).

[8]  Jehoshua Bruck,et al.  Analysis of Checkpointing Schemes with Task Duplication , 1998, IEEE Trans. Computers.

[9]  Ying Zhang,et al.  Dynamic adaptation for fault tolerance and power management in embedded real-time systems , 2004, TECS.

[10]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[11]  Jehoshua Bruck,et al.  Performance Optimization of Checkpointing Schemes with Task Duplication , 1997, IEEE Trans. Computers.

[12]  Naohiro Ishii,et al.  Optimal checkpointing intervals of three error detection schemes by a double modular redundancy , 2003 .