FAULT-TOLERANT MOBILE AGENTS COMPUTING

Reliability is a vital issue in the deployment of mobile agent systems (MASs), which are meant to provide a distributed computing infrastructure for supporting applications in which components can move freely in heterogeneous environments. Design and implementation of mechanisms to relocate computations requires a careful consideration of fault tolerance, which is an essential component of reliability, especially on open networks like the Internet. Mobile agent (MA) fault tolerance requires mechanisms for making agents persistent, for reactivating them and their state activity after a failure, and for reliably transporting them between various agent hosts (AHs). In this paper, we propose several mechanisms to take care of the above problems. These are meant for tolerating host, communication and agent failures on a network and recovering agents and AHs from them. They are based on a novel three-layered approach to fault-tolerance, which avoids the single point failures of centralized systems, while still maintaining the scalability of distributed systems. The proposed techniques have been implemented and tested on PMADE and the results of a comparison of these techniques, with some existing ones, is also reported.

[1]  Markus Straßer,et al.  Reliability Concepts for Mobile Agents , 1998, Int. J. Cooperative Inf. Syst..

[2]  Radu Popescu-Zeletin,et al.  An Approach for Providing Mobile Agent Fault Tolerance , 1998, Mobile Agents.

[3]  André Schiper,et al.  Fault-Tolerant Mobile Agent Execution , 2003, IEEE Trans. Computers.

[4]  Peter A. Gloor,et al.  DartFlow: A Workflow Management System on the Web using Transportable Agents , 1996 .

[5]  Siddhartha Mishra,et al.  Fault-tolerance in agent-based computing systems , 2000 .

[6]  Robbert van Renesse,et al.  Operating system support for mobile agents , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[7]  Luís Moura Silva,et al.  Fault-tolerant execution of mobile agents , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Michael R. Lyu,et al.  A Progressive Fault Tolerant Mechanism in Mobile Agent Systems , 2003 .

[9]  Leslie Lamport,et al.  Reaching Agreement in the Presence of Faults , 1980, JACM.

[10]  Siddhartha Mishra Agent fault tolerance using group communica-tion , 2001 .

[11]  Louise E. Moser,et al.  A Supplier-Driven Electronic Marketplace Using Mobile Agents , 1998 .

[12]  Fred B. Schneider,et al.  NAP: practical fault-tolerance for itinerant computations , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[13]  André Schiper,et al.  Modeling fault-tolerant mobile agent execution as a sequence of agreement problems , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[14]  Holger Pals,et al.  FANTOMAS: Fault Tolerance for Mobile Agents in Clusters , 2000, IPDPS Workshops.

[15]  David Wong,et al.  Security and reliability in Concordia , 1999 .

[16]  Ajay Mohindra,et al.  Exploiting non-determinism for reliability of mobile agent systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[17]  Fred B. Schneider,et al.  Towards Fault-Tolerant and Secure Agentry , 1997, WDAG.

[18]  Dieter K. Hammer,et al.  A reliable mobile agents architecture , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[19]  Pankaj Jalote,et al.  Fault tolerance in distributed systems , 1994 .

[20]  Louise E. Moser,et al.  MAgNET: Mobile Agents for Networked Electronic Trading , 1999, IEEE Trans. Knowl. Data Eng..

[21]  Raimundo José de Araújo Macêdo,et al.  Reliability Requirements in Mobile Agent Systems , 2000, Anais do II Workshop de Testes e Tolerância a Falhas (WTF 2000).

[22]  Akkihebbal L. Ananda,et al.  A survey of remote procedure calls , 1990, OPSR.

[23]  Hyacinth S. Nwana,et al.  Software agents: an overview , 1996, The Knowledge Engineering Review.

[24]  Flávio Morais de Assis Silva,et al.  A transaction model based on mobile agents , 1999 .

[25]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[26]  Markus Straßer,et al.  A fault-tolerant protocol for providing the exactly-once property of mobile agents , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[27]  Keith Marzullo,et al.  Simulating fail-stop in asynchronous distributed systems , 1994, Proceedings of IEEE 13th Symposium on Reliable Distributed Systems.

[28]  Michael B. Dillencourt,et al.  An application-transparent, platform-independent approach to rollback-recovery for mobile agent systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[29]  Hartmut Vogler,et al.  An approach for mobile agent security and fault tolerance using distributed transactions , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[30]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[31]  Kumkum Garg,et al.  A new paradigm for mobile agent computing , 2003 .