On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems

Mobile computing raises many new issues, such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. R. Prakash and M. Singhal (1996) proposed the first coordinated checkpointing algorithm for mobile computing systems. However we showed that their algorithm may result in an inconsistency. In this paper, we prove a more general result about coordinated checkpointing: there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on the proof, we propose an efficient algorithm for mobile computing systems, which forces only a minimum number of processes to take checkpoints and dramatically reduces the blocking time during the checkpointing process. Correctness proofs and performance analysis of the algorithm are provided.

[1]  John Zahorjan,et al.  The challenges of mobile computing , 1994, Computer.

[2]  Jian Xu,et al.  Necessary and Sufficient Conditions for Consistent Global Snapshots , 1995, IEEE Trans. Parallel Distributed Syst..

[3]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[4]  Charles E. Perkins,et al.  A mobile networking system based on Internet protocol , 1993, IEEE Personal Communications.

[5]  Yong Deng,et al.  Checkpointing and rollback-recovery algorithms in distributed systems , 1994, J. Syst. Softw..

[6]  Robert E. Strom,et al.  Optimistic recovery in distributed systems , 1985, TOCS.

[7]  Dhiraj K. Pradhan,et al.  Recovery in distributed mobile environments , 1993, Proceedings 1993 IEEE Workshop on Advances in Parallel and Distributed Systems.

[8]  Gerald Q. Maguire,et al.  IP-based protocols for mobile internetworking , 1991, SIGCOMM 1991.

[9]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[10]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[11]  Luís Moura Silva,et al.  Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[12]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[13]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[14]  Mukesh Singhal,et al.  Maximal global snapshot with concurrent initiators , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[15]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..