A MINIMUM PROCESS SYNCHRONOUS CHECKPOINTING ALGORITHM FOR MOBILE DISTRIBUTED SYSTEM

A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. A mobile computing system is a distributed system where some of processes are running on mobile hosts (MHs), whose location in the network changes with time. The number of processes that take checkpoints is minimized to 1) avoid awakening of MHs in doze mode of operation, 2) minimize thrashing of MHs with checkpointing activity, 3) save limited battery life of MHs and low bandwidth of wireless channels. In minimum-process checkpointing protocols, some useless checkpoints are taken or blocking of processes takes place. In this paper, we propose a minimum-process coordinated checkpointing algorithm for non-deterministic mobile distributed systems, where no useless checkpoints are taken. An effort has been made to minimize the blocking of processes and synchronization message overhead. We try to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others.

[1]  Susan V. Vrbsky,et al.  Pitfalls In Distributed Nonblocking Checkpointing , 2004, J. Interconnect. Networks.

[2]  Mukesh Singhal,et al.  On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[3]  Mukesh Singhal,et al.  Advanced Concepts In Operating Systems , 1994 .

[4]  Andrew S. Tanenbaum,et al.  Distributed systems: Principles and Paradigms , 2001 .

[5]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1997, CACM.

[6]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[7]  Dhiraj K. Pradhan,et al.  Recovery in Mobile Wireless Environment: Design and Trade-off Analysis , 1996 .

[8]  Luís Moura Silva,et al.  Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[9]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[10]  Prashant Kumar,et al.  A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach , 2007, Int. J. Inf. Comput. Secur..

[11]  Parveen Kumar A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems , 2008, Mob. Inf. Syst..

[12]  Parveen Kumar,et al.  A Minimum-Process Coordinated Checkpointing Protocol for Mobile Computing Systems , 2008, Int. J. Found. Comput. Sci..

[13]  Nuno Neves,et al.  Adaptive checkpointinng with storage management for mobile environments , 1999 .

[14]  Mukesh Singhal,et al.  On Coordinated Checkpointing in Distributed Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[15]  R. K. Chauhan,et al.  A Hybrid Coordinated Checkpointing Protocol for Mobile Computing Systems , 2006 .

[16]  Lalit Kumar,et al.  Low overhead optimal checkpointing for mobile distributed systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[17]  P. Kumar,et al.  A non-intrusive minimum process synchronous checkpointing protocol for mobile distributed systems , 2005, 2005 IEEE International Conference on Personal Wireless Communications, 2005. ICPWC 2005..

[18]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[19]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[20]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[21]  Makoto Takizawa,et al.  Checkpoint-recovery protocol for reliable mobile systems , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[22]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[23]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[24]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[25]  Mukesh Singhal,et al.  Mutable checkpoints: a new checkpointing approach for mobile computing systems , 1999, PODC '99.