Computing the Fault Tolerant Capability of Multiagent Deployment

A deployment of a multiagent system on a network refers to the placement of one or more copies of each agent on network hosts, in such a manner that the memory constraints of each node are satisfied. Finding the deployment that is most likely to tolerate faults (i.e. have at least one copy of each agent functioning and in communication with other agents) is a challenge. In this paper, we address the problem of finding the probability of survival of a deployment (i.e. the probability that a deployment will tolerate faults), under the assumption that node failures are independent. We show that the problem of computing the survival probability of a deployment is at least NP-hard. Moreover, it is hard to approximate. We produce two algorithms to accurately compute the probability of survival of a deployment—these algorithms are expectedly exponential. We also produce five heuristic algorithms to estimate survival probabilities—these algorithms work in acceptable time frames. We report on a detailed set of experiments to determine the conditions under which some of these algorithms perform better than the others.

[1]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[2]  Oguz Dikenelli,et al.  Applying feedback control in adaptive replication mechanisms in fault tolerant multi-agent organizations , 2006, SELMAS '06.

[3]  Matthew C. Elder,et al.  Survivability architectures: issues and approaches , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[4]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[5]  Katia P. Sycara,et al.  Cloning for Intelligent Adaptive Information Agents , 1996, DAI.

[6]  Pierre Sens,et al.  Towards Adaptive Fault-Tolerance For Distributed Multi-Agent Systems , 2001 .

[7]  Michal Pechoucek,et al.  Review of Industrial Deployment of Multi-Agent Systems , 2006 .

[8]  Y. Zhang,et al.  Approximation results for probabilistic survivability , 2005, IEEE 2nd Symposium on Multi-Agent Security and Survivability, 2005..

[9]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[10]  Yingqian Zhang,et al.  Distributed Algorithms for Dynamic Survivability of Multiagent Systems , 2004, CLIMA.

[11]  Francisco P. Maturana,et al.  Industrial MAS for Planning and Control , 2001, Multi-Agent-Systems and Applications.

[12]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[13]  Ranjeev Mittu,et al.  Building upon the Coalitions Agent Experiment (COAX) - Integration of Multimedia Information in GCCS-M using IMPACT , 2003, Multimedia Information Systems.

[14]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[15]  Christian Eitzinger,et al.  Triangular Norms , 2001, Künstliche Intell..

[16]  Pierre Sens,et al.  DARX - a framework for the fault-tolerant support of agent software , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[17]  Siddhartha Mishra,et al.  Fault-tolerance in agent-based computing systems , 2000 .

[18]  Marshall Brinn,et al.  A framework to control emergent survivability of multi agent systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[19]  Jean-Pierre Briot,et al.  Adaptive replication of large-scale multi-agent systems: towards a fault-tolerant multi-agent platform , 2005, SELMAS '05.

[20]  Francesc Esteva,et al.  Review of Triangular norms by E. P. Klement, R. Mesiar and E. Pap. Kluwer Academic Publishers , 2003 .

[21]  Somesh Jha,et al.  Increasing Resource Utilization and Task Performance by Agent Cloning , 1998, ATAL.

[22]  Frank Feather,et al.  Fault detection in an Ethernet network using anomaly signature matching , 1993, SIGCOMM '93.

[23]  Edmund H. Durfee,et al.  Coalition Agents Experiment: Multiagent Cooperation in International Coalitions , 2002, IEEE Intell. Syst..

[24]  Sushil Jajodia,et al.  An algorithm for dynamic data distribution , 1992, [1992 Proceedings] Second Workshop on the Management of Replicated Data.

[25]  Nancy R. Mead,et al.  Survivability: Protecting Your Critical Systems , 1999, IEEE Internet Comput..

[26]  Carlos José Pereira de Lucena,et al.  Experience and prospects for various control strategies for self-replicating multi-agent systems , 2006, SEAMS '06.

[27]  Jürgen Dix,et al.  Heterogeneous Agent Systems , 2000 .

[28]  Marshall Brinn,et al.  Leveraging agent properties to assure survivability of distributed multi-agent systems , 2003, AAMAS '03.

[29]  David Wells,et al.  Extending the limits of DMAS survivability: the UltraLog project , 2004, IEEE Intelligent Systems.

[30]  Bharadwaj Veeravalli,et al.  A Dynamic Object Allocation And Replication Algorithm For Distributed Systems With Centralized Control , 2006 .

[31]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[32]  John C. Knight,et al.  Achieving Critical System Survivability Through Software Architectures , 2003, WADS.

[33]  Matthias Tichy,et al.  Building reliable systems based on self-organizing multi-agent systems , 2006, SELMAS '06.

[34]  Ralph Deters,et al.  Using dynamic proxy agent replicate groups to improve fault-tolerance in multi-agent systems , 2003, AAMAS '03.

[35]  Sarit Kraus,et al.  Probabilistically Survivable MASs , 2003, IJCAI.

[36]  Flaviu Cristian,et al.  Understanding fault-tolerant distributed systems , 1991, CACM.

[37]  Eric R. Ziegel,et al.  Probability and Statistics for Engineering and the Sciences , 2004, Technometrics.

[38]  Xiaocong Fan On Splitting and Cloning Agents , 2001 .

[39]  Ralph Deters,et al.  Improving fault-tolerance by replicating agents , 2002, AAMAS '02.

[40]  J. Galambos,et al.  Bonferroni-type inequalities with applications , 1996 .

[41]  Kavitha Ranganathan,et al.  Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[42]  Felix C. Freiling,et al.  Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments , 1999, ACM Comput. Surv..

[43]  Hector J. Levesque,et al.  The adaptive agent architecture: achieving fault-tolerance using persistent broker teams , 2000, Proceedings Fourth International Conference on MultiAgent Systems.