An asynchronous recovery scheme based on optimistic message logging for mobile computing systems

This paper presents an asynchronous recovery scheme to provide fault-tolerance for mobile computing systems. The proposed scheme is based on optimistic message logging, since the checkpointing-only schemes are not suitable for the mobile environment in which unreliable mobile hosts and fragile network connection may hinder any kind of coordination for checkpointing and recovery. Also, in order to reduce the overhead imposed on mobile hosts, mobile support stations take charge of logging and dependency tracking, and mobile hosts maintain only a small amount of information for mobility tracking. As a result, truly asynchronous recovery for mobile systems can be achieved with little overhead.

[1]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[2]  Tomasz Imielinski,et al.  Structuring distributed algorithms for mobile hosts , 1994, 14th International Conference on Distributed Computing Systems.

[3]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[4]  Bharat K. Bhargava,et al.  Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[5]  Dhiraj K. Pradhan,et al.  Recoverable mobile environment: design and trade-off analysis , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[6]  Mukesh Singhal,et al.  Low-cost checkpointing with mutable checkpoints in mobile computing systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[7]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1997, CACM.

[8]  Lorenzo Alvisi,et al.  Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[9]  W. Kent Fuchs,et al.  Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[10]  Yuval Tamir,et al.  ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .

[11]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[12]  A. Prasad Sistla,et al.  Efficient distributed recovery using message logging , 1989, PODC '89.

[13]  Brian Randell System structure for software fault tolerance , 1975 .

[14]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[15]  Willy Zwaenepoel,et al.  Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[16]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[17]  Augusto Ciuffoletti,et al.  A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[18]  D. Manivannan,et al.  Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems , 1996 .

[19]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[20]  Sean W. Smith,et al.  Completely asynchronous optimistic recovery with minimal rollbacks , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[21]  Lorenzo Alvisi,et al.  Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[22]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[23]  Vijay K. Garg,et al.  How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[24]  Hon Fung Li,et al.  Optimal Checkpointing and Local Recording for Domino-Free Rollback Recovery , 1987, Inf. Process. Lett..

[25]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.