论文信息 - Efficient Message Logging for Uncoordinated Checkpointing Protocols

Efficient Message Logging for Uncoordinated Checkpointing Protocols

A message is in-transit with respect to a global state if its sending is recorded in this global state, while its receipt is not. Checkpointing algorithms have to log such in-transit messages in order to restore the state of channels when a computation has to be resumed from a consistent global state after a failure has occurred. Coordinated checkpointing algorithms log those in-transit messages exactly on stable storage. Because of their lack of synchronization, uncoordinated checkpointing algorithms conservatively log more messages.

Achour Mostéfaoui | Michel Raynal | M. Raynal | A. Mostéfaoui

[1] Frédéric Ruget,et al. Cheaper Matrix Clocks , 1994, WDAG.

[2] David B. Johnson,et al. Sender-Based Message Logging , 1987 .

[3] Jian Xu,et al. Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..

[4] Luís Moura Silva,et al. Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[5] B. R. Badrinath,et al. Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[6] Michel Raynal,et al. Consistent Checkpointing in Message Passing Distributed Systems , 1995 .

[7] Leslie Lamport,et al. Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[8] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[9] Robert E. Strom,et al. Optimistic recovery in distributed systems , 1985, TOCS.

[10] Lorenzo Alvisi,et al. Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[11] Jian Xu,et al. Sender-based message logging for reducing rollback propagation , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[12] Brian Randell,et al. System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[13] David B. Johnson,et al. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing , 1988, J. Algorithms.

[14] Mukesh Singhal,et al. An Optimality Proof for Asynchronous Recovery Algorithms in Distributed Systems , 1995, Inf. Process. Lett..

[15] W. Kent Fuchs,et al. Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems , 1995, IEEE Trans. Parallel Distributed Syst..

[16] Achour Mostéfaoui,et al. Characterization of consistent global checkpoints in large-scale distributed systems , 1995, Proceedings of the Fifth IEEE Computer Society Workshop on Future Trends of Distributed Computing Systems.

[17] Arthur J. Bernstein,et al. Efficient solutions to the replicated log and dictionary problems , 1984, PODC '84.

[18] David L. Russell,et al. State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.

[19] André Schiper,et al. The Causal Ordering Abstraction and a Simple Way to Implement it , 1991, Inf. Process. Lett..

[20] Taesoon Park,et al. Checkpointing and rollback-recovery in distributed systems , 1989 .

[21] Willy Zwaenepoel,et al. Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[22] W. Kent Fuchs,et al. Optimistic message logging for independent checkpointing in message-passing systems , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.