Analysis of Recent Checkpointing Techniques for Mobile Computing Systems

Recovery from transient failures is one of the prime issues in the context of distributed systems. These systems demand to have transparent yet efficient techniques to achieve the same. Checkpoint is defined as a designated place in a program where normal processing of a system is interrupted to preserve the status information. Checkpointing is a process of saving status information. Mobile computing systems often suffer from high failure rates that are transient and independent in nature. To add reliability and high availability to such distributed systems, checkpoint based rollback recovery is one of the widely used techniques for applications such as scientific computing, database, telecommunication applications and mission critical applications. This paper surveys the algorithms which have been reported in the literature for checkpointing in Mobile Computing Systems.

[1]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[2]  Susan V. Vrbsky,et al.  Low-cost coordinated nonblocking checkpointing in mobile computing systems , 2003, Proceedings of the Eighth IEEE Symposium on Computers and Communications. ISCC 2003.

[3]  Lorenzo Alvisi,et al.  Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  Willy Zwaenepoel,et al.  On the use and implementation of message logging , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[5]  Guohui Li,et al.  A Low-Latency Checkpointing Scheme for Mobile Computing Systems , 2005, COMPSAC.

[6]  D. Manivannan,et al.  An optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems , 2008, J. Parallel Distributed Comput..

[7]  Achour Mostéfaoui,et al.  A communication-induced checkpointing protocol that ensures rollback-dependency trackability , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[8]  D. Manivannan,et al.  Quasi-Synchronous Checkpointing: Models, Characterization, and Classification , 1999, IEEE Trans. Parallel Distributed Syst..

[9]  Jiannong Cao,et al.  Checkpointing in hybrid distributed systems , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[10]  Heon Young Yeom,et al.  An efficient recovery scheme for fault-tolerant mobile computing systems , 2003, Future Gener. Comput. Syst..

[11]  Bharat K. Bhargava,et al.  Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[12]  Vijay K. Garg,et al.  Optimistic recovery in multi-threaded distributed systems , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[13]  Mehdi Lotfi,et al.  Lightweight blocking coordinated checkpointing for cluster computer systems , 2009, 2009 41st Southeastern Symposium on System Theory.

[14]  David B. Johnson,et al.  Distributed system fault tolerance using message logging and checkpointing , 1990 .

[15]  Parveen Kumar,et al.  Soft-Checkpointing Based Hybrid Synchronous Checkpointing Protocol for Mobile Distributed Systems , 2011, Int. J. Distributed Syst. Technol..

[16]  Ajay D. Kshemkalyani A symmetric O(n log n) message distributed snapshot algorithm for large-scale systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[17]  Shahram Rahimi,et al.  Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks , 2008, J. Comput. Inf. Technol..

[18]  L. Alvisi,et al.  Nonblocking and Orphan-Free Message Logging Protocols , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[19]  Heon Young Yeom,et al.  An Efficient Optimistic Message Logging Scheme for Recoverable Mobile Computing Systems , 2002, IEEE Trans. Mob. Comput..

[20]  Tong-Ying Tony Juang,et al.  Efficient algorithms for optimistic crash recovery , 1994, Distributed Computing.

[21]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[22]  Achour Mostéfaoui,et al.  Communication-Induced Determination of Consistent Snapshots , 1999, IEEE Trans. Parallel Distributed Syst..

[23]  Jiannong Cao,et al.  CIC: an integrated approach to checkpointing in mobile agent systems , 2006, 2006 Semantics, Knowledge and Grid, Second International Conference on.

[24]  Zhibo Wu,et al.  Area Difference Based Recovery Information Placement for Mobile Computing Systems , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[25]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[26]  Ing-Ray Chen,et al.  Movement-based checkpointing and logging for recovery in mobile computing systems , 2006, MobiDE '06.

[27]  Shahram Rahimi,et al.  A New High Performance Checkpointing Approach for Mobile Computing Systems , 2006 .

[28]  Lorenzo Alvisi Understanding the message logging paradigm for masking process crashes , 1996 .

[29]  Parveen Kumar A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems , 2008, Mob. Inf. Syst..

[30]  KRISHNENDU MUKHOPADHYAYA,et al.  Mobile Agent Based Checkpointing with Concurrent Initiations , 2007, Int. J. Found. Comput. Sci..

[31]  Prashant Kumar,et al.  A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach , 2007, Int. J. Inf. Comput. Secur..

[32]  D. Manivannan,et al.  An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[33]  Mukesh Singhal,et al.  Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems , 2001, IEEE Trans. Parallel Distributed Syst..

[34]  S. Neogy,et al.  A Low Overhead Checkpointing Scheme for Mobile Computing Systems , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[35]  Achour Mostéfaoui,et al.  Communication-induced determination of consistent snapshots , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).