论文信息 - Impossibility of scalar clock-based communication-induced checkpointing protocols ensuring the RDT property

Impossibility of scalar clock-based communication-induced checkpointing protocols ensuring the RDT property

Abstract Communication-induced checkpointing protocols constitute an interesting approach to the on-line determination of checkpoint and communication patterns enjoying desirable properties such as domino-effect freedom. They do not add control messages to the computation, but instead may attach control information to computation messages. Among these protocols, scalar clock-based protocols are particularly attractive as they use a single integer as control information. An interesting property of checkpoint and communication patterns is Rollback-Dependency Trackability, which ensures that all local checkpoint dependencies are on-the-fly trackable. So, it would be nice to design scalar clock-based communication-induced checkpointing protocols providing the RDT property, a previously open question. This paper shows that the design of such protocols is impossible.

Achour Mostéfaoui | Michel Raynal | Roberto Baldoni | Jean-Michel Hélary

[1] Michel Raynal,et al. Consistency Issues in Distributed Checkpoints , 1999, IEEE Trans. Software Eng..

[2] Sy-Yen Kuo,et al. Theoretical Analysis for Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability , 1998, IEEE Trans. Parallel Distributed Syst..

[3] Yin-Min Wang,et al. Consistent Global checkpoints that Contain a Given Set of Local Chekpoints , 1997, IEEE Trans. Computers.

[4] Achour Mostéfaoui,et al. A communication-induced checkpointing protocol that ensures rollback-dependency trackability , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[5] LamportLeslie. Time, clocks, and the ordering of events in a distributed system , 1978 .

[6] Bruno Ciciani,et al. A VP-accordant checkpointing protocol preventing useless checkpoints , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[7] Augusto Ciuffoletti,et al. A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[8] Jian Xu,et al. Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..

[9] Sy-Yen Kuo,et al. Evaluations of Domino-Free Communication-Induced Checkpointing Protocols , 1999, Inf. Process. Lett..

[10] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[11] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[12] David L. Russell,et al. State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.

[13] Achour Mostéfaoui,et al. Virtual Precedence in Asynchronous Systems: Cencept and Applications , 1997, WDAG.

[14] Hon Fung Li,et al. Optimal Checkpointing and Local Recording for Domino-Free Rollback Recovery , 1987, Inf. Process. Lett..

[15] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[16] Achour Mostéfaoui,et al. Communication-Induced Determination of Consistent Snapshots , 1999, IEEE Trans. Parallel Distributed Syst..

[17] D. Manivannan,et al. A low-overhead recovery technique using quasi-synchronous checkpointing , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[18] Roberto Baldoni,et al. An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems , 1999, IEEE Trans. Parallel Distributed Syst..

[19] D. Manivannan,et al. Quasi-Synchronous Checkpointing: Models, Characterization, and Classification , 1999, IEEE Trans. Parallel Distributed Syst..

[20] Achour Mostéfaoui,et al. Communication-based prevention of useless checkpoints in distributed computations , 2000, Distributed Computing.