A Minimum-Process Coordinated Checkpointing Protocol for Mobile Computing Systems

Checkpoint is a designated place in a program at which normal process is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. A checkpoint algorithm for mobile distributed systems needs to handle many new issues like: mobility, low bandwidth of wireless channels, lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional checkpointing techniques unsuitable for such environments. Minimum-process coordinated checkpointing is an attractive approach to introduce fault tolerance in mobile distributed systems transparently. This approach is domino-free, requires at most two checkpoints of a process on stable storage, and forces only a minimum number of processes to checkpoint. But, it requires extra synchronization messages, blocking of the underlying computation or taking some useless checkpoints. In this paper, we design a minimum-process checkpointing algorithm for mobile distributed systems, where no useless checkpoint is taken. We reduce the blocking of processes by allowing the processes to do their normal computations, send messages and receive selective messages during their blocking period.