Computing the fault tolerance of multi-agent deployment

A deployment of a multi-agent system on a network refers to the placement of one or more copies of each agent on network hosts, in such a manner that the memory constraints of each node are satisfied. Finding the deployment that is most likely to tolerate faults (i.e. have at least one copy of each agent functioning and in communication with other agents) is a challenge. In this paper, we address the problem of finding the probability of survival of a deployment (i.e. the probability that a deployment will tolerate faults), under the assumption that node failures are independent. We show that the problem of computing the survival probability of a deployment is at least NP-hard. Moreover, it is hard to approximate. We produce two algorithms to accurately compute the probability of survival of a deployment-these algorithms are expectedly exponential. We also produce five heuristic algorithms to estimate survival probabilities-these algorithms work in acceptable time frames. We report on a detailed set of experiments to determine the conditions under which some of these algorithms perform better than the others.

[1]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[2]  Ralph Deters,et al.  Using dynamic proxy agent replicate groups to improve fault-tolerance in multi-agent systems , 2003, AAMAS '03.

[3]  Frank Feather,et al.  Fault detection in an Ethernet network using anomaly signature matching , 1993, SIGCOMM '93.

[4]  Ranjeev Mittu,et al.  Building upon the Coalitions Agent Experiment (COAX) - Integration of Multimedia Information in GCCS-M using IMPACT , 2003, Multimedia Information Systems.

[5]  Marshall Brinn,et al.  Leveraging agent properties to assure survivability of distributed multi-agent systems , 2003, AAMAS '03.

[6]  Y. Zhang,et al.  Approximation results for probabilistic survivability , 2005, IEEE 2nd Symposium on Multi-Agent Security and Survivability, 2005..

[7]  Francisco P. Maturana,et al.  Industrial MAS for Planning and Control , 2001, Multi-Agent-Systems and Applications.

[8]  Jean-Pierre Briot,et al.  Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform , 2005, SELMAS '05.

[9]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[10]  Katia P. Sycara,et al.  Cloning for Intelligent Adaptive Information Agents , 1996, DAI.

[11]  Pierre Sens,et al.  Towards Adaptive Fault-Tolerance For Distributed Multi-Agent Systems , 2001 .

[12]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[13]  Michael Rovatsos,et al.  Capturing agent autonomy in roles and XML , 2003, AAMAS '03.

[14]  J. Galambos,et al.  Bonferroni-type inequalities with applications , 1996 .

[15]  Sarit Kraus,et al.  Probabilistically Survivable MASs , 2003, IJCAI.

[16]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[17]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[18]  Michal Pechoucek,et al.  Review of Industrial Deployment of Multi-Agent Systems , 2006 .

[19]  Christian Eitzinger,et al.  Triangular Norms , 2001, Künstliche Intell..

[20]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[21]  Felix C. Freiling,et al.  Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments , 1999, ACM Comput. Surv..

[22]  Marshall Brinn,et al.  A framework to control emergent survivability of multi agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[23]  Hector J. Levesque,et al.  The adaptive agent architecture: achieving fault-tolerance using persistent broker teams , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[24]  Sushil Jajodia,et al.  An algorithm for dynamic data distribution , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[25]  David Wells,et al.  Extending the limits of DMAS survivability: the UltraLog project , 2004, IEEE Intelligent Systems.

[26]  Jürgen Dix,et al.  Heterogeneous Agent Systems , 2000 .

[27]  Bharadwaj Veeravalli,et al.  A Dynamic Object Allocation And Replication Algorithm For Distributed Systems With Centralized Control , 2006 .

[28]  Jennifer Seberry,et al.  Proceedings of the Second International Workshop on Information Security , 1997 .

[29]  John C. Knight,et al.  Achieving Critical System Survivability Through Software Architectures , 2003, WADS.

[30]  Matthias Tichy,et al.  Building reliable systems based on self-organizing multi-agent systems , 2006, SELMAS '06.

[31]  Michael Wooldridge,et al.  Proceedings of the 4th International Workshop on Intelligent Agents IV, Agent Theories, Architectures, and Languages , 1997 .

[32]  Eric R. Ziegel,et al.  Probability and Statistics for Engineering and the Sciences , 2004, Technometrics.

[33]  Xiaocong Fan On Splitting and Cloning Agents , 2001 .

[34]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[35]  Ralph Deters,et al.  Improving fault-tolerance by replicating agents , 2002, AAMAS '02.

[36]  Oguz Dikenelli,et al.  Applying feedback control in adaptive replication mechanisms in fault tolerant multi-agent organizations , 2006, SELMAS '06.

[37]  Matthew C. Elder,et al.  Survivability architectures: issues and approaches , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[38]  Michael Luck,et al.  AAMAS '03: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems , 2003 .

[39]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[40]  Edmund H. Durfee,et al.  Coalition Agents Experiment: Multiagent Cooperation in International Coalitions , 2002, IEEE Intell. Syst..

[41]  Nancy R. Mead,et al.  Survivability: Protecting Your Critical Systems , 1999, IEEE Internet Comput..

[42]  Carlos José Pereira de Lucena,et al.  Experience and prospects for various control strategies for self-replicating multi-agent systems , 2006, SEAMS '06.

[43]  Pierre Sens,et al.  DARX - a framework for the fault-tolerant support of agent software , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[44]  Siddhartha Mishra,et al.  Fault-tolerance in agent-based computing systems , 2000 .

[45]  Francesc Esteva,et al.  Review of Triangular norms by E. P. Klement, R. Mesiar and E. Pap. Kluwer Academic Publishers , 2003 .

[46]  Yingqian Zhang,et al.  Distributed Algorithms for Dynamic Survivability of Multiagent Systems , 2004, CLIMA.

[47]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[48]  Chengqi Zhang,et al.  Multi-Agent Systems Methodologies and Applications , 1996, Lecture Notes in Computer Science.

[49]  Somesh Jha,et al.  Increasing Resource Utilization and Task Performance by Agent Cloning , 1998, ATAL.