Adaptive replication of large-scale multi-agent systems

In order to construct and deploy large-scale multi-agent systems, we must address one of the fundamental issues of distributed systems, the possibility of partial failures. This means that fault-tolerance is an inevitable issue for large-scale multi-agent systems. In this paper, we discuss the issues and propose an approach for fault-tolerance of multi-agent systems. The starting idea is the application of replication strategies to agents, the most critical agents being replicated to prevent failures. As criticality of agents may evolve during the course of computation and problem solving, and as resources are bounded, we need to dynamically and automatically adapt the number of replicas of agents, in order to maximize their reliability and availability. We will describe our approach and related mechanisms for evaluating the criticality of a given agent (based on application-level semantic information, e.g. interdependences, and also system-level statistical information, e.g., communication load) and for deciding what strategy to apply (e.g., active replication, passive) how to parameterize it (e.g., number of replicas). We also will report on experiments conducted with our prototype architecture (named DimaX).

[1]  Pierre Sens,et al.  DARX - a framework for the fault-tolerant support of agent software , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[2]  V. S. Subrahmanian,et al.  Probabilistically survivable MASs , 2003, IJCAI 2003.

[3]  Nico Roos,et al.  A protocol for multi-agent diagnosis with spatially distributed knowledge , 2003, AAMAS '03.

[4]  Mark Klein,et al.  Using Domain-Independent Exception Handling Services to Enable Robust Open Multi-Agent Systems: The Case of Agent Death , 2003, Autonomous Agents and Multi-Agent Systems.

[5]  Ralph Deters,et al.  Improving fault-tolerance by replicating agents , 2002, AAMAS '02.

[6]  Marco Colombetti,et al.  An analysis of agent speech acts as institutional actions , 2002, AAMAS '02.

[7]  Jaime Simão Sichman,et al.  Multi-agent dependence by dependence graphs , 2002, AAMAS '02.

[8]  Milind Tambe,et al.  Monitoring Teams by Overhearing: A Multi-Agent Plan-Recognition Approach , 2002, J. Artif. Intell. Res..

[9]  Pierre Sens,et al.  Implementation and performance evaluation of an adaptable failure detector , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Lotfi A. Zadeh,et al.  A New Direction in AI: Toward a Computational Theory of Perceptions , 2001, AI Mag..

[11]  Victor R. Lesser,et al.  Using self-diagnosis to adapt organizational structures , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[12]  Luís Moura Silva,et al.  Fault-tolerant execution of mobile agents , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[13]  Gul Agha,et al.  A actor-based architecture for customizing and controlling agent ensembles , 1999, IEEE Intell. Syst..

[14]  Radu Popescu-Zeletin,et al.  An Approach for Providing Mobile Agent Fault Tolerance , 1998, Mobile Agents.

[15]  Jean-Pierre Briot,et al.  From Active Objects to Autonomous Agents , 1998, IEEE Concurr..

[16]  Yves Demazeau,et al.  A Social Reasoning Mechanism Based On Dependence Networks , 1997, ECAI.

[17]  Staffan Haegg,et al.  A Sentinel Approach to Fault Handling in Multi-Agent Systems , 1996, DAI.

[18]  R. V. Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[19]  Rachid Guerraoui,et al.  Lessons from Designing and Implementing GARF , 1995, OBPDC.

[20]  Kevin Crowston,et al.  The interdisciplinary study of coordination , 1994, CSUR.

[21]  Franco Zambonelli,et al.  Software Engineering for Large-Scale Multi-Agent Systems , 2003, Lecture Notes in Computer Science.

[22]  Pierre Sens,et al.  Dynamic and Adaptive Replication for Large-Scale Reliable Multi-agent Systems , 2002, SELMAS.