An efficient recovery scheme for fault-tolerant mobile computing systems

Abstract This paper presents an efficient recovery scheme to provide fault-tolerance for the mobile computing systems. The proposed scheme is based on the message logging and the independent checkpointing, and for the efficient management of the recovery information, such as checkpoints and message logs, the movement-based scheme is suggested. The mobile host carrying its recovery information to the nearby mobile support station can recover instantly in case of a failure. However, the support stations visited by the mobile host may have to experience the high failure-free execution cost for transferring the recovery information and accessing the stable storage. On the other hand, the recovery cost can be too high, if the recovery information remains dispersed over a number of mobile support stations. The movement-based scheme considers both of the failure-free execution cost and the failure-recovery cost. Hence, while a mobile host moves within a certain range, the recovery information of the mobile host remains at the support stations where the information was first saved. However, if the mobile host moves out of the range, it transfers the recovery information into the nearby support station. As a result, the scheme can control the transfer cost as well as the recovery cost. The performance of the proposed scheme is evaluated with extensive simulation study.

[1]  Augusto Ciuffoletti,et al.  A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[2]  D. Manivannan,et al.  Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems , 1996 .

[3]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.

[4]  Dhiraj K. Pradhan,et al.  Recoverable mobile environment: design and trade-off analysis , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[5]  Mukesh Singhal,et al.  Low-cost checkpointing with mutable checkpoints in mobile computing systems , 1998, Proceedings. 18th International Conference on Distributed Computing Systems (Cat. No.98CB36183).

[6]  Nuno Neves,et al.  Adaptive recovery for mobile environments , 1997, CACM.

[7]  Lorenzo Alvisi,et al.  Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[8]  W. Kent Fuchs,et al.  Message logging in mobile computing , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[9]  Mukesh Singhal,et al.  Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..

[10]  Tomasz Imielinski,et al.  Structuring distributed algorithms for mobile hosts , 1994, 14th International Conference on Distributed Computing Systems.

[11]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[12]  A. Prasad Sistla,et al.  Efficient distributed recovery using message logging , 1989, PODC '89.

[13]  Vijay K. Garg,et al.  Distributed recovery with K-optimistic logging , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[14]  W. Kent Fuchs,et al.  Lazy checkpoint coordination for bounding rollback propagation , 1992, Proceedings of 1993 IEEE 12th Symposium on Reliable Distributed Systems.

[15]  Junguk L. Kim,et al.  An Efficient Protocol for Checkpointing Recovery in Distributed Systems , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Vijay K. Garg,et al.  How to recover efficiently and asynchronously when optimism fails , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[17]  L. Alvisi,et al.  Nonblocking and Orphan-Free Message Logging Protocols , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[18]  Hon Fung Li,et al.  Optimal Checkpointing and Local Recording for Domino-Free Rollback Recovery , 1987, Inf. Process. Lett..

[19]  Heon Young Yeom,et al.  Application controlled checkpointing coordination for fault-tolerant distributed computing systems , 2000, Parallel Comput..

[20]  Willy Zwaenepoel,et al.  Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit , 1992, IEEE Trans. Computers.

[21]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[22]  Heon Young Yeom,et al.  An asynchronous recovery scheme based on optimistic message logging for mobile computing systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[23]  Keqin Li,et al.  Optimal dynamic location update for PCS networks , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[24]  Sean W. Smith,et al.  Completely asynchronous optimistic recovery with minimal rollbacks , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[25]  Yuval Tamir,et al.  ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS , 1984 .

[26]  Ian F. Akyildiz,et al.  On location management for personal communications networks , 1996 .

[27]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.