Semantic-compensation-based recovery in multi-agent systems

In agent systems, an agent's recovery from, execution problems is often complicated by constraints that are not present in a more traditional distributed, database systems environment. An analysis of agent-related crash recovery issues is presented, and requirements for achieving 'acceptable' agent crash recovery are discussed. Motivated by this analysis, a novel approach to managing agent recovery is presented. It utilises an event-and task-driven model for employing semantic compensation; task retries, and checkpointing. The compensation/retry model requires a situated model of action and failure, and provides the agent with an emergent unified, treatment of both crash recovery and run-time failure-handling. This approach helps the agent to recover acceptably from crashes and execution problems; improve system predictability; manage inter-task dependencies; and address the way in which exogenous events or crashes can trigger the need for a re-decomposition of a task. Agent architecture is then presented, which uses pair processing to leverage these recovery techniques and increase the agent's availability on crash restart.

[1]  Hector J. Levesque,et al.  ConGolog, a concurrent programming language based on the situation calculus , 2000, Artif. Intell..

[2]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[3]  James Bailey,et al.  Managing Semantic Compensation in a Multi-agent System , 2004, CoopIS/DOA/ODBASE.

[4]  Monica S. Lam,et al.  Transparent Fault Tolerance for Parallel Applications on Networks of Workstations , 1996, USENIX Annual Technical Conference.

[5]  Stefan Tai,et al.  The next step in Web services , 2003, CACM.

[6]  Mark Klein,et al.  Using Domain-Independent Exception Handling Services to Enable Robust Open Multi-Agent Systems: The Case of Agent Death , 2003, Autonomous Agents and Multi-Agent Systems.

[7]  Gerhard Weikum,et al.  Recovery guarantees for general multi-tier applications , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Karen L. Myers,et al.  The SPARK agent framework , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[9]  Khaled Nagi,et al.  Implementation Model for Agents with Layered Architecture in a Transactional Database Environment , 1999 .

[10]  Raymond S. Tomlinson,et al.  Robustness Infrastructure for Multi-Agent Systems , 2004 .

[11]  Ralph Deters,et al.  Improving fault-tolerance by replicating agents , 2002, AAMAS '02.

[12]  Bharat K. Bhargava,et al.  Ensuring relaxed atomicity for flexible transactions in multidatabase systems , 1994, SIGMOD '94.

[13]  Christelle Urtado,et al.  Improving Exception Handling in Multi-agent Systems , 2003, SELMAS.

[14]  Khaled Nagi,et al.  Transactional Suppor t for Cooperation in M ultiagent-based Information Systems , 2001 .

[15]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[16]  Frank Dignum,et al.  A Programming Language for Cognitive Agents Goal Directed 3APL , 2003, PROMAS.

[17]  Mark Klein,et al.  Towards robust multi-agent systems: handling communication exceptions in double auctions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[18]  Cornelia Boldyreff,et al.  Mobile agent fault tolerance for information retrieval applications: an exception handling approach , 2003, The Sixth International Symposium on Autonomous Decentralized Systems, 2003. ISADS 2003..

[19]  Wolfgang Faber,et al.  Plan reversals for recovery in execution monitoring , 2004, NMR.