DESIGN AND PERFORMANCE ANALYSIS OF COORDINATED CHECKPOINTING ALGORITHMS FOR DISTRIBUTED MOBILE SYSTEMS

Checkpointing is an efficient fault tolerance technique used in distributed systems. Mobile computing raises many new issues, such as high mobility, lack of stable storage on mobile hosts (MHs), low bandwidth of wireless channels, limited battery life and disconnections that make the traditional checkpointing protocols unsuitable for such systems. Several checkpointing algorithms have been reported in the literature. In this paper, we analyze some of existing coordinated checkpointing algorithms on the basic of blocking time, synchronization message overhead, number of processes required to checkpoint, number of useless checkpoint, piggybacked information messages onto computation messages and concurrent execution. We also proposed an efficient checkpointing algorithm to reduce the checkpointing overheads. Our checkpoint algorithm does not have any synchronization message overhead as it uses time to indirectly coordinate to create the consistent cut in distributed mobile system without increasing the number of checkpoints..

[1]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[2]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[3]  Mukesh Singhal,et al.  On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[4]  Shing-Tsaan Huang,et al.  Detecting termination of distributed computations by external agents , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[5]  Ian F. Akyildiz,et al.  Mobility Management in Next Generation Wireless Systems , 1999, ICCCN.

[6]  Yong Deng,et al.  Checkpointing and rollback-recovery algorithms in distributed systems , 1994, J. Syst. Softw..

[7]  Mukesh Singhal,et al.  Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems , 2001, IEEE Trans. Parallel Distributed Syst..

[8]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[9]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[10]  Makoto Takizawa,et al.  Checkpoint-recovery protocol for reliable mobile systems , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[11]  Susan V. Vrbsky,et al.  Low-cost coordinated nonblocking checkpointing in mobile computing systems , 2003, Proceedings of the Eighth IEEE Symposium on Computers and Communications. ISCC 2003.

[12]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[13]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  Brian Randell Reliable Computing Systems , 1978, Advanced Course: Operating Systems.

[15]  Guohui Li,et al.  A Low-Latency Checkpointing Scheme for Mobile Computing Systems , 2005, COMPSAC.

[16]  Achour Mostéfaoui,et al.  A communication-induced checkpointing protocol that ensures rollback-dependency trackability , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[17]  Achour Mostéfaoui,et al.  Communication-induced determination of consistent snapshots , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[18]  Parameswaran Ramanathan,et al.  Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System , 1993, IEEE Trans. Software Eng..

[19]  Mukesh Singhal,et al.  On Coordinated Checkpointing in Distributed Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[20]  Luís Moura Silva,et al.  Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[21]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[22]  Lalit Kumar,et al.  Low overhead optimal checkpointing for mobile distributed systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  P. Kumar,et al.  A non-intrusive minimum process synchronous checkpointing protocol for mobile distributed systems , 2005, 2005 IEEE International Conference on Personal Wireless Communications, 2005. ICPWC 2005..

[24]  Krishnendu Mukhopadhyaya,et al.  Performance analysis of different checkpointing and recovery schemes using stochastic model , 2006, J. Parallel Distributed Comput..

[25]  Parveen Kumar A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems , 2008, Mob. Inf. Syst..

[26]  Ahmed Al-Nazer,et al.  On Disk-based and Diskless Checkpointing for Parallel and Distributed Systems: An Empirical Analysis , 2005 .

[27]  Ten-Hwang Lai,et al.  On Distributed Snapshots , 1987, Inf. Process. Lett..

[28]  Susan V. Vrbsky,et al.  Pitfalls In Distributed Nonblocking Checkpointing , 2004, J. Interconnect. Networks.