Roll-Forward Checkpointing Schemes

In modular redundant systems, tasks are replicated to achieve fault-tolerance. Checkpointing schemes that exploit replication can achieve better performance than the ones that ignore how the fault detection mechanism is implemented [24]. This Chapter presents two such schemes named Dynamic Roll-Forward Checkpointing Scheme and the Static Roll-Forward Checkpointing Scheme.

[1]  Kang G. Shin,et al.  Optimal Checkpointing of Real-Time Tasks , 1987, IEEE Transactions on Computers.

[2]  Dhiraj K. Pradhan,et al.  Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture , 1994, IEEE Trans. Computers.

[3]  Erol Gelenbe,et al.  Performance of rollback recovery systems under intermittent failures , 1978, CACM.

[4]  Mukul R. Kundu,et al.  Impulsive phase of solar flares , 1980 .

[5]  Prathima Agrawal,et al.  Fault Tolerance in Multiprocessor Systems without Dedicated Redundancy , 1988, IEEE Trans. Computers.

[6]  C. V. Ramamoorthy,et al.  Rollback and Recovery Strategies for Computer Programs , 1972, IEEE Transactions on Computers.

[7]  Yashwant K. Malaiya Linearly Correlated Intermittent Failures , 1982, IEEE Transactions on Reliability.

[8]  Omri Serlin Fault-Tolerant Systems in Commercial Applications , 1984, Computer.

[9]  Ravishankar K. Iyer,et al.  Failure analysis and modeling of a VAXcluster system , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[10]  D.P. Siewiorek,et al.  A case study of C.mmp, Cm*, and C.vmp: Part I—Experiences with fault tolerance in multiprocessor systems , 1978, Proceedings of the IEEE.

[11]  G. V. Kulkarni,et al.  Effects of Checkpointing and Queueing on Program Performance , 1987 .

[12]  Jacob A. Abraham,et al.  Implementing Forward Recovery Using Checkpoints in Distributed Systems , 1992 .

[13]  C. I. Dimmer The tandem non-stop system , 1986 .

[14]  Philip A. Bernstein,et al.  Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing , 1988, Computer.

[15]  Adit D. Singh,et al.  Modelling correlated transient failures in fault-tolerant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[16]  Jacob A. Abraham,et al.  Forward Recovery Using Checkpointing in Parallel Systems , 1990, ICPP.

[17]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[18]  Jacques Malenfant,et al.  Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems , 1988, IEEE Trans. Computers.