A novel min-process checkpointing scheme for mobile computing systems

In distributed computing systems, processes in different hosts take checkpoints to survive failures. For mobile computing systems, due to certain new characteristics such as mobility, low bandwidth, disconnection, low power consumption and limited memory, conventional distributed checkpointing schemes need to be reconsidered. In this paper, a novel min-process coordinated checkpointing algorithm that makes full use of the computation ability and power of mobile support stations is proposed. During normal computation message transmission, the checkpoint dependency information among mobile hosts is recorded in the corresponding mobile support stations. When a checkpointing procedure begins, the initiator concurrently informs relevant mobile hosts, which minimizes the identifying time. Moreover, compared with the existing coordinated checkpointing schemes, our algorithm blocks the minimum number of mobile support stations during the identifying procedure, which leads to the improvement of the system performance. In addition, the proposed algorithm is a min-process, domino-free checkpointing algorithm, which is especially desirable for mobile computing systems. Quantitative analysis and experimental simulation show that our algorithm outperforms other coordinated checkpointing schemes in terms of the identifying time and the number of blocked mobile support stations and then can provide a better system performance for mobile computing systems.

[1]  Shing-Tsaan Huang,et al.  Detecting termination of distributed computations by external agents , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[2]  Yong Deng,et al.  Checkpointing and rollback-recovery algorithms in distributed systems , 1994, J. Syst. Softw..

[3]  Luís Moura Silva,et al.  Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[4]  Jiannong Cao,et al.  Checkpointing and rollback of wide-area distributed applications using mobile agents , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[5]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[6]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[7]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[8]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[9]  Bernhard Walke,et al.  IEEE 802.11 Wireless Local Area Networks , 2006 .

[10]  Flaviu Cristian,et al.  A timestamp-based checkpointing protocol for long-lived distributed computations , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[11]  Parameswaran Ramanathan,et al.  Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System , 1993, IEEE Trans. Software Eng..

[12]  Yi-Min Wang,et al.  Maximum and minimum consistent global checkpoints and their applications , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[13]  WangYi-Min Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints , 1997 .

[14]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[15]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Mukesh Singhal,et al.  On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).