Multiagent-Based Fault Tolerance Management for Robustness

Despite the use of software engineering best practices and tools, it would be very risky to assume that the software that is developed today is fault-free. Moreover, we have to consider the fact that the software could face unexpected situations not considered during its design. Robustness is a highly desirable and sometimes indispensable software requirement, especially for critical systems, where the consequences of a system failure can be catastrophic. This chapter outlines existing fault tolerance techniques, followed by a discussion of the potential that multiagent systems have to enhance the design of robust, fault-tolerant systems, thereby improving large-scale, critical, and complex system reliability.

[1]  Peter G. Bishop Software Fault Tolerance by Design Diversity , 1995 .

[2]  Daniel D. Corkill,et al.  Determining confidence when integrating contributions from multiple agents , 2007, AAMAS '07.

[3]  Jean-Claude Laprie,et al.  Dependable computing: concepts, limits, challenges , 1995 .

[4]  Roy A. Maxion,et al.  Improving software robustness with dependability cases , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[5]  Algirdas Avizienis,et al.  Fault Tolerance by Design Diversity: Concepts and Experiments , 1984, Computer.

[6]  Michael N. Huhns,et al.  Robust software via agent-based redundancy , 2003, AAMAS '03.

[7]  Brian Randell,et al.  Turing Memorial Lecture Facing Up to Faults , 2000, Comput. J..

[8]  Karl-Erwin Großpietsch,et al.  An adaptive approach for n-version systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  Ronald J. Brachman,et al.  (AA)AI More than the Sum of Its Parts , 2006, AI Mag..

[10]  Pascal Traverse AIRBUS and ATR System Architecture and Specification , 1988 .

[11]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[12]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[13]  Nancy G. Leveson,et al.  Safeware: System Safety and Computers , 1995 .

[14]  David F. McAllister,et al.  The consensus recovery block , 1983 .

[15]  Michael N. Huhns,et al.  Multiagent reputation management to achieve robust software using redundancy , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[16]  Jie Xu,et al.  Assessing multi-version systems through fault injection , 2002, Proceedings of the Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems. (WORDS 2002).

[17]  Algirdas A. Avi The Methodology of N-Version Programming , 1995 .

[18]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[19]  Edward J. McCluskey,et al.  A design diversity metric and reliability analysis for redundant systems , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[20]  L. Shapley,et al.  Optimizing group judgmental accuracy in the presence of interdependencies , 1984 .

[21]  John D. Musa,et al.  Software reliability - measurement, prediction, application , 1987, McGraw-Hill series in software engineering and technology.

[22]  Ann T. Tai,et al.  Performability enhancement of fault-tolerant software , 1993 .

[23]  Brian Randell,et al.  The Evolution of the Recovery Block Concept , 1994 .

[24]  Douglas B. Moran,et al.  The Open Agent Architecture: A Framework for Building Distributed Software Systems , 1999, Appl. Artif. Intell..

[25]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[26]  Behrooz Parhami,et al.  From defects to failures: a view of dependable computing , 1988, CARN.

[27]  R. Laddaga Creating robust software through self-adaptation , 1999, IEEE Intelligent Systems and their Applications.

[28]  Kenneth N. Brown,et al.  Robust Constraint Solving Using Multiple Heuristics , 2005, CP.

[29]  Michael R. Lyu,et al.  Software diversity metrics and measurements , 1992, [1992] Proceedings. The Sixteenth Annual International Computer Software and Applications Conference.

[30]  Steven Fraser,et al.  Fostering software robustness in an increasingly hostile world , 2005, OOPSLA '05.

[31]  Michael R. Lyu,et al.  Assuring Design Diversity in N-Version Software: A Design Paradigm for N-Version Programming , 1992 .

[32]  David F. McAllister,et al.  An Empirical Evaluation of Consensus Voting and Consensus Recovery Block Reliability in the Presence of Failure Correlation , 2007 .

[33]  John Hasling Group discussion and decision making , 1975 .

[34]  Raymond T. Yeh,et al.  Proceedings of the international conference on Reliable software , 1975 .

[35]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[36]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[37]  H S Andersson,et al.  COMPUTER CONTROLLED INTERLOCKING SYSTEM , 1981 .

[38]  Reid G. Smith,et al.  The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver , 1980, IEEE Transactions on Computers.

[39]  Laura L. Pullum,et al.  Software Fault Tolerance Techniques and Implementation , 2001 .

[40]  Michael N. Huhns,et al.  Achieving Software Robustness via Large-Scale Multiagent Systems , 2002, SELMAS.

[41]  Michael N. Huhns,et al.  On Building Robust Web Service-Based Applications , 2004 .

[42]  Adam Cheyer,et al.  The Open Agent Architecture , 1997, Autonomous Agents and Multi-Agent Systems.

[43]  T. Seeley,et al.  Group decision making in honey bee swarms , 2006 .

[44]  Michael R. Lyu Software Fault Tolerance , 1995 .

[45]  Brian Randell System structure for software fault tolerance , 1975 .

[46]  David F. McAllister,et al.  An empirical evaluation of maximum likelihood voting in failure correlation conditions , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[47]  T. W. Anderson,et al.  Resilient Computing Systems , 1987 .