A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach

Coordinated checkpointing is a method that minimises number of processes to checkpoint for an initiation. It may require blocking of processes, extra synchronisation messages or useless checkpoints. We propose a minimum process coordinated checkpointing algorithm where the number of useless checkpoints and blocking are reduced using a probabilistic approach that computes an interacting set of processes on checkpoint initiation. A process checkpoints if the probability that it will get a checkpoint request in current initiation is high. A few processes may be blocked but they can continue their normal computation and may send messages. We also modified methodology to maintain exact dependencies.

[1]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[2]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1997, CACM.

[3]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[4]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[5]  Makoto Takizawa,et al.  Checkpoint-recovery protocol for reliable mobile systems , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[6]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[7]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[8]  Mukesh Singhal,et al.  On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[9]  Mukesh Singhal,et al.  On Coordinated Checkpointing in Distributed Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[10]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[11]  Lalit Kumar,et al.  Low overhead optimal checkpointing for mobile distributed systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  P. Kumar,et al.  A non-intrusive minimum process synchronous checkpointing protocol for mobile distributed systems , 2005, 2005 IEEE International Conference on Personal Wireless Communications, 2005. ICPWC 2005..

[13]  Mukesh Singhal,et al.  Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems , 2001, IEEE Trans. Parallel Distributed Syst..

[14]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[15]  Lalit Kumar,et al.  Checkpointing in Distributed Computing Systems , 2002 .