FANTOMAS: Fault Tolerance for Mobile Agents in Clusters

To achieve an efficient utilization of cluster systems, a proper programming and operating environment is required. In this context, mobile agents are of growing interest as base for distributed and parallel applications. As mobile and autonomous software units, mobile agents can execute tasks given to the system and allocate independently all the needed resources. However, with growing cluster sizes, the probability of a failure of one or more system components and therewith the loss of mobile agents rises. While fault tolerance issues for applications based on "traditional" processes have been extensively studied, current agent environments provide only insufficient, if at all, extensions for a capable reaction on such kinds of failures.We examine fault tolerance with regard to properties and requirements of mobile agents, and find that independent checkpointing with receiver based message logging is appropriate in this context. We derive the FANTOMAS (Fault-Tolerant approach for Mobile Agents) design which offers a user transparent fault tolerance that can be activated on request, according to the needs of the task. A theoretical analysis examines the advantages and drawbacks of FANTOMAS.

[1]  Brian Randell System structure for software fault tolerance , 1975 .

[2]  Munindar P. Singh,et al.  Agents on the Web: Mobile Agents , 1997, IEEE Internet Comput..

[3]  Robbert van Renesse,et al.  Cryptographic support for fault-tolerant distributed computing , 1996, EW 7.

[4]  Enrico Gobbetti,et al.  Encyclopedia of Electrical and Electronics Engineering , 1999 .

[5]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[6]  David Wong,et al.  Java-based mobile agents , 1999, CACM.

[7]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis of Computer Systems , 1996, Springer US.

[8]  Stefan Petri A Common Framework for Transparent Checkpointing, Replication and Migration in Clusters , 1999, ARCS Workshops.

[9]  Nicholas R. Jennings,et al.  Agent Theories, Architectures, and Languages: A Survey , 1995, ECAI Workshop on Agent Theories, Architectures, and Languages.

[10]  Munehiro Fukuda,et al.  Mobile Network Objects , 1999 .

[11]  Wolfgang Obelöer,et al.  Agent-Based Load Balancing for Mobile Robot Applications , 1998, DIPES.

[12]  Markus Straßer,et al.  Reliability Concepts for Mobile Agents , 1998, Int. J. Cooperative Inf. Syst..

[13]  Harrick M. Vin,et al.  Hybrid Message Logging Protocols for Fast Recovery , 1998 .

[14]  Dejan S. Milojicic,et al.  Old Wine in New Bottles Applying OS Process Migration Technology to Mobile Agents , 1997 .

[15]  Robbert van Renesse,et al.  Operating system support for mobile agents , 1995, Proceedings 5th Workshop on Hot Topics in Operating Systems (HotOS-V).

[16]  Aaron Kershenbaum,et al.  Mobile Agents: Are They a Good Idea? , 1996, Mobile Object Systems.

[17]  Holger Pals,et al.  Load management with mobile agents , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).

[18]  Tim Brecht,et al.  Ajents: towards an environment for parallel, distributed and mobile Java applications , 1999, JAVA '99.

[19]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[20]  Kishor S. Trivedi,et al.  Performance And Reliability Analysis Of Computer Systems (an Example-based Approach Using The Sharpe Software , 1997, IEEE Transactions on Reliability.

[21]  Hartmut Vogler,et al.  An approach for mobile agent security and fault tolerance using distributed transactions , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[22]  Robbert van Renesse,et al.  An introduction to the TACOMA distributed system. Version 1.0 , 1995 .

[23]  Ahmed Karmouch,et al.  Mobile software agents: an overview , 1998, IEEE Commun. Mag..