A Proxy Based Efficient Checkpointing Scheme for Fault Recovery in Mobile Grid System

Mobile Grid is an emerging and prospering field of distributed computing where mobile devices are enjoying the benefits of Grid. Challenges faced by mobile Grid are unpredictable network quality, lower trust, limited resources (battery power, network bandwidth, storage, processing power, etc) and extended periods of disconnections which may result in lost of the work done by these devices. We, therefore, need a proper fault tolerance scheme for these mobile hosts. A major issue is the appropriate handling of failures with minimal processing and storage overhead on mobile hosts. To meet these goals, we propose a proxy-based coordinated checkpointing scheme for our mobile to Grid middleware, Mobile Access to Grid Infrastructure (MAGi). In this scheme mobile hosts seamlessly store checkpoints on their respective proxies running on the middleware. Together with the central coordinator component, these proxies act as a centralized checkpointing store. This approach makes it efficient to rollback to the latest consistent global snapshot, without direct involvement of the mobile hosts, which results in less processing and storage overhead on mobile device as compared to existing schemes.

[1]  Luís Moura Silva,et al.  Global checkpointing for distributed programs , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[2]  B. R. Badrinath,et al.  Checkpointing distributed applications on mobile computers , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[3]  John Zahorjan,et al.  The challenges of mobile computing , 1994, Computer.

[4]  Mukesh Singhal,et al.  On the impossibility of min-process non-blocking checkpointing and an efficient checkpointing algorithm for mobile computing systems , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[5]  Mukesh Singhal,et al.  Checkpointing with mutable checkpoints , 2003, Theor. Comput. Sci..

[6]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[7]  Mukesh Singhal,et al.  Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems , 2001, IEEE Trans. Parallel Distributed Syst..

[8]  Willy Zwaenepoel,et al.  The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.

[9]  Steven Tuecke,et al.  The Anatomy of the Grid , 2003 .

[10]  Young-Koo Lee,et al.  MAGI - Mobile Access to Grid Infrastructure: Bringing the gifts of Grid to Mobile Computing , 2005, NODe/GSEM.

[11]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[12]  Edsger W. Dijkstra,et al.  Self-stabilizing systems in spite of distributed control , 1974, CACM.

[13]  Mukesh Singhal,et al.  On Coordinated Checkpointing in Distributed Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[14]  Young-Bae Ko,et al.  Disconnected Operation Service in Mobile Grid Computing , 2003, ICSOC.

[15]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.