Towards reliable multi-agent systems: An adaptive replication mechanism

Distributed cooperative applications are now increasingly being designed as a set of autonomous entities, named agents, which interact and coordinate (thus named a multi-agent system). Such applications are often very dynamic: new agents can join or leave, they can change roles, strategies, etc. This high dynamicity creates new challenges to the traditional approaches of fault-tolerance. In this paper, we will focus on crash failures, with usual preventive approaches by replication. But, as criticality of agents may evolve during the course of computation and problem solving, static design is not appropriate. Thus we need to dynamically and automatically identify the most critical agents and to adapt their replication strategies (e.g., active or passive, number of replicas), in order to maximize their reliability and their availability. In this paper, we describe a prototype architecture, supporting adaptive replication. We also discuss and compare various control strategies for replication, one using agent roles, and another using inter-agent dependences as types of information to infer and estimate criticality of agents. Experiments and measurements are also reported.

[1]  S. Griffis EDITOR , 1997, Journal of Navigation.

[2]  H. Van Dyke Parunak,et al.  Representing Agent Interaction Protocols in UML , 2000, AOSE.

[3]  Luís Moura Silva,et al.  Fault-tolerant execution of mobile agents , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[4]  H. Zimmermann,et al.  Latent connectives in human decision making , 1980 .

[5]  J. Picard Centre National de la Recherche Scientifique (CNRS) , 2008 .

[6]  Jacques Ferber,et al.  Multi-agent systems - an introduction to distributed artificial intelligence , 1999 .

[7]  Jean-Pierre Briot,et al.  From Active Objects to Autonomous Agents , 1998, IEEE Concurr..

[8]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[9]  Amal El Fallah Seghrouchni,et al.  Open protocol design for complex interactions in multi-agent systems , 2002, AAMAS '02.

[10]  Olivier Marin,et al.  DimaX: a fault-tolerant multi-agent platform , 2006, SELMAS '06.

[11]  Jacques Ferber,et al.  Aalaadin: A Meta-Model for the Analysis and Design of Organizations in Multi-Agent Systems , 1997 .

[12]  Katia P. Sycara,et al.  Cloning for Intelligent Adaptive Information Agents , 1996, DAI.

[13]  Frances M. T. Brazier,et al.  Fault tolerance in scalable agent support systems: integrating DARX in the AgentScape framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[14]  Nathan J. Muller The OpenView Enterprise Management Framework , 1996 .

[15]  Yansong Ren,et al.  AQuA: A Framework for Providing Adaptive Fault Tolerance to Distributed Applications , 2001 .

[16]  Yixin Diao,et al.  ABLE: A toolkit for building multiagent autonomic systems , 2002, IBM Syst. J..

[17]  Danny Dolev,et al.  The architecture and performance of security protocols in the ensemble group communication system: Using diamonds to guard the castle. , 2001 .

[18]  Danny Dolev,et al.  The architecture and performance of security protocols in the ensemble group communication system , 2000, ACM Trans. Inf. Syst. Secur..

[19]  Michael Golm,et al.  metaXa and the Future of Reflection , 1998 .

[20]  Nicholas R. Jennings,et al.  A methodology for agent-oriented analysis and design , 1999, AGENTS '99.

[21]  Radu Popescu-Zeletin,et al.  An Approach for Providing Mobile Agent Fault Tolerance , 1998, Mobile Agents.

[22]  Ravishankar K. Iyer,et al.  Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..

[23]  Samir Aknine,et al.  A Predictive Method for Providing Fault Tolerance in Multi-agent Systems , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[24]  Alessandro F. Garcia,et al.  Software engineering for large-scale multi-agent systems - SELMAS'05 , 2004, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[25]  Milind Tambe,et al.  Monitoring Teams by Overhearing: A Multi-Agent Plan-Recognition Approach , 2002, J. Artif. Intell. Res..

[26]  Jacques Ferber,et al.  A meta-model for the analysis and design of organizations in multi-agent systems , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[27]  Jean-Pierre Briot,et al.  Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform , 2005, ACM SIGSOFT Softw. Eng. Notes.

[28]  I. Bey,et al.  Delta-4: A Generic Architecture for Dependable Distributed Computing , 1991, Research Reports ESPRIT.

[29]  Jaime Simão Sichman,et al.  Multi-agent dependence by dependence graphs , 2002, AAMAS '02.

[30]  Kenneth P. Birman Replication and fault-tolerance in the ISIS system , 1985, SOSP 1985.

[31]  Nicholas M. Avouris,et al.  Distributed artificial intelligence: theory and praxis , 1992 .

[32]  Günter Karjoth,et al.  Access control with IBM Tivoli access manager , 2003, TSEC.

[33]  Pierre Sens,et al.  Performance analysis of a hierarchical failure detector , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[34]  Victor R. Lesser,et al.  Using self-diagnosis to adapt organizational structures , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[35]  Rachid Guerraoui,et al.  Lessons from Designing and Implementing GARF , 1995, OBPDC.

[36]  Ralph Deters,et al.  Improving fault-tolerance by replicating agents , 2002, AAMAS '02.

[37]  Yves Demazeau,et al.  A Social Reasoning Mechanism Based On Dependence Networks , 1997, ECAI.

[38]  Robbert van Renesse,et al.  Horus: a flexible group communication system , 1996, CACM.

[39]  Sarit Kraus,et al.  Probabilistically Survivable MASs , 2003, IJCAI.

[40]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[41]  Staffan Haegg,et al.  A Sentinel Approach to Fault Handling in Multi-Agent Systems , 1996, DAI.

[42]  Franco Zambonelli,et al.  Software Engineering for Large-Scale Multi-Agent Systems , 2003, Lecture Notes in Computer Science.

[43]  Jean-Pierre Briot,et al.  Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform , 2005, SELMAS '05.