The performance of checkpointing and replication schemes for fault tolerant mobile agent systems

We evaluate the performance of checkpointing and replication schemes for the fault tolerant mobile agent system. For the quantitative comparison, we have implemented an experimental system on top of the Mole mobile agent system and also built a simulation system to include various failure cases. Our experiment aims to have the insight into the behavior of agents under two schemes and provide a guideline for the fault tolerant system design. The experimental results show that the checkpointing scheme shows a very stable performance; and for the replication scheme, some controllable system parameter values should be chosen carefully to achieve the desirable performance.

[1]  André Schiper,et al.  FATOMAS-a fault-tolerant mobile agent system based on the agent-dependent approach , 2001, 2001 International Conference on Dependable Systems and Networks.

[2]  Markus Straßer,et al.  Reliability Concepts for Mobile Agents , 1998, Int. J. Cooperative Inf. Syst..

[3]  Fritz Hohl,et al.  Mole – Concepts of a mobile agent system , 1999, World Wide Web.

[4]  Fred B. Schneider,et al.  NAP: practical fault-tolerance for itinerant computations , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[5]  Markus Straßer,et al.  System mechanisms for partial rollback of mobile agent execution , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[6]  Richard D. Schlichting,et al.  Fail-Stop Processors: An Approach to Designing Computing Systems , 1983 .

[7]  Hartmut Vogler,et al.  An approach for mobile agent security and fault tolerance using distributed transactions , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[8]  Ahmed Karmouch,et al.  Mobile software agents: an overview , 1998, IEEE Commun. Mag..

[9]  André Schiper,et al.  Modeling fault-tolerant mobile agent execution as a sequence of agreement problems , 2000, Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000.

[10]  Luís Moura Silva,et al.  Fault-tolerant execution of mobile agents , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[11]  Michael B. Dillencourt,et al.  An application-transparent, platform-independent approach to rollback-recovery for mobile agent systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.