CIC: an integrated approach to checkpointing in mobile agent systems

As a widely used fault tolerance technique, checkpointing has evolved into several schemes: independent, coordinated, and communication-induced (CIC). Independent and coordinated checkpointing have been adopted in many works on fault tolerant mobile agent (MA) systems. However, CIC, a flexible, efficient, and scalable checkpointing scheme, has not been applied to MA systems. Based on the analysis of the behavior of mobile agent, we argue that CIC is a well suited checkpointing scheme for MA systems. CIC not only establishes the consistent recovery lines efficiently but also integrates well with the independent checkpointing for reliable MA migration. Here, we propose an important improvement to CIC, referred to as the deferred message processing based CIC algorithm (DM-CIC), which achieves higher efficiency by exempting the CIC algorithm from making the forced checkpoints in MA systems. Through simulation, we find out that DM-CIC is stable and better suited to large scale MA systems.

[1]  Aaron Kershenbaum,et al.  Mobile Agents: Are They a Good Idea? , 1996, Mobile Object Systems.

[2]  Heon Young Yeom,et al.  The performance of checkpointing and replication schemes for fault tolerant mobile agent systems , 2002, 21st IEEE Symposium on Reliable Distributed Systems, 2002. Proceedings..

[3]  Michael R. Lyu,et al.  Performance and effectiveness analysis of checkpointing in mobile environments , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[4]  Ajay Mohindra,et al.  Exploiting non-determinism for reliability of mobile agent systems , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[5]  Holger Pals,et al.  FANTOMAS: Fault Tolerance for Mobile Agents in Clusters , 2000, IPDPS Workshops.

[6]  Dieter K. Hammer,et al.  A reliable mobile agents architecture , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[7]  Fred B. Schneider,et al.  NAP: practical fault-tolerance for itinerant computations , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[8]  David Wong,et al.  Concordia: An Infrastructure for Collaborating Mobile Agents , 1997, Mobile Agents.

[9]  Jiannong Cao,et al.  A Framework for Transactional Mobile Agent Execution , 2005, GCC.

[10]  Holger Peine,et al.  The Architecture of the Ara Platform for Mobile Agents , 1999, Mobile Agents.

[11]  B LangeDanny,et al.  Seven good reasons for mobile agents , 1999 .

[12]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[13]  Michael B. Dillencourt,et al.  An application-transparent, platform-independent approach to rollback-recovery for mobile agent systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[14]  Jie Liu,et al.  A scalable P2P platform for the knowledge grid , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  David L. Russell,et al.  State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.

[16]  Andrzej Bargiela,et al.  An approach to rollback recovery of collaborating mobile agents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[17]  Jiannong Cao,et al.  Checkpointing and rollback of wide-area distributed applications using mobile agents , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[18]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[19]  Augusto Ciuffoletti,et al.  A Distributed Domino-Effect free recovery Algorithm , 1984, Symposium on Reliability in Distributed Software and Database Systems.

[20]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.