Antecedence graph based checkpointing and recovery for mobile agents

Mobile agents are distributed programs which can move autonomously in a network, to perform tasks on behalf of user. Though mobile agents offer much more flexibility as compared to client-server computing, yet they have additional cost and issues such as security, reliability and fault tolerance which need to be addressed for successful adaptability of mobile agent technology for developing real life applications. Fault tolerance aims to provide reliable execution of agents even in face of failures that may occur on account of various errors that emerge during migration request failure, communication exceptions, system crashes or security violations. The graph based fault tolerance protocols have been successfully used for the implementation of fault tolerance in distributed computing. This paper proposes use of antecedence graphs and message logs for maintaining fault tolerance information of mobile agents. In order to reduce the overheads of the carrying fault tolerance information in form of large antecedence graphs, we propose the use of parallel checkpointing algorithm. For checkpointing, dependent agents are marked out using antecedence graphs; and only these agents are involved in process of taking checkpoints. In case of failures, the antecedence graphs and message logs are regenerated for recovery and then normal operation continued. Analysis of results show considerable improvement in terms of reduced message overhead, execution and recovery times as compared to the graph based existing approach.

[1]  Hong Shen,et al.  Analysis of Mobile Agents' Fault-Tolerant Behavior , 2004, PDCAT.

[2]  Markus Straßer,et al.  A fault-tolerant protocol for providing the exactly-once property of mobile agents , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[3]  Xavier Défago,et al.  A Survey of Mobile Agent-Based Fault-Tolerant Technology , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[4]  Michael R. Lyu,et al.  Design and evaluation of a fault-tolerant mobile-agent system , 2004, IEEE Intelligent Systems.

[5]  André Schiper,et al.  FATOMAS-a fault-tolerant mobile agent system based on the agent-dependent approach , 2001, 2001 International Conference on Dependable Systems and Networks.

[6]  Heon Young Yeom,et al.  The cost of checkpointing, logging and recovery for the mobile agent systems , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..

[7]  Aamer Nadeem,et al.  An Antecedence Graph Approach for Fault Tolerance in a Multi-Agent , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[8]  Hyacinth S. Nwana,et al.  Software agents: an overview , 1996, The Knowledge Engineering Review.

[9]  P. Venkataram,et al.  Applications of agent technology in communications: a review , 2004, Comput. Commun..

[10]  Kyeongmo Park A Fault-Tolerant Mobile Agent Model in Replicated Secure Services , 2004, ICCSA.

[11]  Jiannong Cao,et al.  CIC: an integrated approach to checkpointing in mobile agent systems , 2006, 2006 Semantics, Knowledge and Grid, Second International Conference on.

[12]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[13]  Willy Zwaenepoel,et al.  Manetho: fault tolerance in distributed systems using rollback-recovery and process replication , 1994 .