An Efficient Optimistic Message Logging Scheme for Recoverable Mobile Computing Systems

A number of checkpointing and message logging algorithms have been proposed to support fault tolerance of mobile computing systems. However, little attention has been paid to the optimistic message logging scheme. Optimistic logging has a lower failure-free operation cost compared to other logging schemes. It also has a lower failure recovery cost compared to the checkpointing schemes. This paper presents an efficient scheme to implement optimistic logging for the mobile computing environment. In the proposed scheme, the task of logging is assigned to the mobile support station so that volatile logging can be utilized. In addition, to reduce the message overhead, the mobile support station takes care of dependency tracking and the potential dependency between mobile hosts is inferred from the dependency between mobile support stations. The performance of the proposed scheme is evaluated by an extensive simulation study. The results show that the proposed scheme requires a small failure-free overhead and the cost of unnecessary rollback caused by the imprecise dependency is adjustable by properly selecting the logging frequency.

[1]  Flaviu Cristian,et al.  A timestamp-based checkpointing protocol for long-lived distributed computations , 1991, [1991] Proceedings Tenth Symposium on Reliable Distributed Systems.

[2]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.

[3]  Heon Young Yeom,et al.  An efficient recovery scheme for fault-tolerant mobile computing systems , 2003, Future Gener. Comput. Syst..

[4]  Lorenzo Alvisi,et al.  Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[5]  Willy Zwaenepoel,et al.  Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[6]  W. Kent Fuchs,et al.  Message logging in mobile computing , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[7]  A. Prasad Sistla,et al.  Efficient distributed recovery using message logging , 1989, PODC '89.

[8]  Yuval Tamir,et al.  ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .

[9]  Sean W. Smith,et al.  Completely asynchronous optimistic recovery with minimal rollbacks , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Augusto Ciuffoletti,et al.  A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[11]  D. Manivannan,et al.  Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems , 1996 .

[12]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[13]  Heon Young Yeom,et al.  Application controlled checkpointing coordination for fault-tolerant distributed computing systems , 2000, Parallel Comput..

[14]  Heon Young Yeom,et al.  An efficient recovery scheme for mobile computing environments , 2001, Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001.

[15]  Keqin Li,et al.  Optimal dynamic location update for PCS networks , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[16]  Lorenzo Alvisi,et al.  Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[17]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[18]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[19]  Dhiraj K. Pradhan,et al.  Recoverable mobile environment: design and trade-off analysis , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[20]  Mukesh Singhal,et al.  Low-cost checkpointing with mutable checkpoints in mobile computing systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[21]  Ian F. Akyildiz,et al.  On location management for personal communications networks , 1996 .

[22]  Heon Young Yeom,et al.  Efficient recovery information management schemes for the fault tolerant mobile computing systems , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[23]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[24]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[25]  Tomasz Imielinski,et al.  Structuring distributed algorithms for mobile hosts , 1994, 14th International Conference on Distributed Computing Systems.

[26]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[27]  Vijay K. Garg,et al.  Distributed recovery with K-optimistic logging , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[28]  W. Kent Fuchs,et al.  Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[29]  Vijay K. Garg,et al.  How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[30]  Hon Fung Li,et al.  Optimal Checkpointing and Local Recording for Domino-Free Rollback Recovery , 1987, Inf. Process. Lett..

[31]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[32]  Richard D. Schlichting,et al.  Fail-Stop Processors: An Approach to Designing Computing Systems , 1983 .

[33]  Vijay K. Garg,et al.  Distributed recovery with K-optimistic logging , 2003, J. Parallel Distributed Comput..