Checkpoint-recovery protocol for reliable mobile systems

Information systems consist of mobile stations and fixed stations. Mission critical applications are required to be executed fault-tolerantly in these systems. However, mobile stations support neither enough volume of storage and processing power nor enough capacity of battery to do reliable, long-term communications. Moreover, wireless channels are less reliable. Hence, the channels with the mobile stations are often disconnected. Therefore, it is difficult for multiple mobile stations to synchronously take checkpoints since the communication channels with the mobile stations may be disconnected even during taking the checkpoints. In this paper, we propose a novel hybrid checkpointing protocol where the mobile stations take asynchronously and the fixed ones take synchronously checkpoints. Reliable information systems including mobile stations can be realized by the hybrid checkpointing protocol.

[1]  Mahadev Satyanarayanan,et al.  Disconnected Operation in the Coda File System , 1999, Mobidata.

[2]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[3]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[4]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[5]  Bharat K. Bhargava,et al.  Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[6]  Jun Murai,et al.  VIP: a protocol providing host mobility , 1994, CACM.

[7]  Hon Fung Li,et al.  Optimal Checkpointing and Local Recording for Domino-Free Rollback Recovery , 1987, Inf. Process. Lett..

[8]  Masahiko Tsukamoto,et al.  A CLNP-based protocol for mobile end systems within an area , 1993, 1993 International Conference on Network Protocols.

[9]  A. Prasad Sistla,et al.  Data replication for mobile computers , 1994, SIGMOD '94.

[10]  Tong-Ying Tony Juang,et al.  Efficient Algorithms for Crash Recovery in Distributed Systems , 1990, FSTTCS.

[11]  Richard Y. Kain,et al.  Rollback Recovery in Distributed Systems Using Loosely Synchronized Clocks , 1992, IEEE Trans. Parallel Distributed Syst..

[12]  Dhiraj K. Pradhan,et al.  Recovery in Mobile Wireless Environment: Design and Trade-off Analysis , 1996 .

[13]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[14]  Tomasz Imielinski,et al.  Sleepers and workaholics: caching strategies in mobile environments , 1994, SIGMOD '94.

[15]  Dhiraj K. Pradhan,et al.  Recoverable mobile environment: design and trade-off analysis , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[16]  Mukesh Singhal,et al.  Low-cost checkpointing with mutable checkpoints in mobile computing systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[17]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[18]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.