Exploiting non-determinism for reliability of mobile agent systems

An important technical hurdle blocking the adoption of mobile agent technology is the lack of reliability. Designing a reliable mobile agent system is especially challenging since a mobile agent is potentially affected by failure of any host that it visits, or failure of any communication link that it needs to traverse. Previous work in this domain has attempted techniques such as periodic checkpointing of mobile agent state and restarting upon machine or communication recovery. Such approaches render an agent unavailable until a machine or a communication link itself recovers. In this paper, we take an alternate approach based on the premise that a mobile agent can often complete its task in more than one way. We capture such redundancy in non-deterministic constructs in the agent language and maintain state about an agent's actual computational path in its possible computational tree. We design and implement a distributed recovery scheme that detects a failure, rolls back an agent's computation, and restarts the agent from a previous point in its computational tree down a different but equivalent computational path without waiting for the actual failure itself to be repaired.

[1]  Fred B. Schneider,et al.  NAP: practical fault-tolerance for itinerant computations , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[2]  Fred B. Schneider,et al.  Towards Fault-Tolerant and Secure Agentry , 1997, WDAG.

[3]  Seif Haridi,et al.  Distributed Algorithms , 1992, Lecture Notes in Computer Science.

[4]  Ellen M. Voorhees,et al.  Intelligent routers , 1989, [1989] Proceedings. The 9th International Conference on Distributed Computing Systems.

[5]  Markus Straßer,et al.  Providing Reliable Agents for Electronic Commerce , 1998, Trends in Distributed Systems for Electronic Commerce.

[6]  Markus Straßer,et al.  Reliability Concepts for Mobile Agents , 1998, Int. J. Cooperative Inf. Syst..

[7]  C. Laas Fault Tolerant Computing , 2000 .

[8]  Danny B. Lange,et al.  Programming and Deploying Java¿ Mobile Agents with Aglets¿ , 1998 .

[9]  Murthy V. Devarakonda,et al.  Programming Network Components Using NetPebbles: An Early Report , 1998, COOTS.

[10]  David Wong,et al.  Security and reliability in Concordia , 1999 .

[11]  W. Kent Fuchs,et al.  Progressive retry for software error recovery in distributed systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[12]  Robert S. Gray,et al.  Agent Tcl: A transportable agent system , 1995, CIKM Information Agents Workshop.

[13]  Holger Peine,et al.  The Architecture of the Ara Platform for Mobile Agents , 1999, Mobile Agents.

[14]  Marios Mavronicolas,et al.  Proceedings of the 11th International Workshop on Distributed Algorithms , 1997 .

[15]  E. Eugene Schultz,et al.  Hawaii international conference on system sciences , 1992, SGCH.

[16]  Michael Goldsmith,et al.  Programming in occam 2 , 1985, Prentice Hall international series in computer science.

[17]  Geraint Jones Programming in occam , 1986, Prentice Hall International Series in Computer Science.

[18]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.