Software Fault Tolerance

A fault-tolerant software unit is composed of N~2 diverse member units, usually developed by N separate teams, and an execution environment. The development process employs diversity requirements, communication protocols, and inter-team isolation rules to promote the greatest possible independence of team efforts and diversity among their products. The principal models, specification, building, evaluation, and system integration of fault-tolerant software are discussed, and goals for future work

[1]  Peter G. Bishop The PODS Diversity Experiment , 1988 .

[2]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[3]  H. Kopetz,et al.  The Evolution of Fault-Tolerant Computing , 1987, Dependable Computing and Fault-Tolerant Systems.

[4]  Gunnar Hagelin ERICSSON Safety System for Railway Control , 1988 .

[5]  Udo Voges,et al.  Use of Diversity in Experimental Reactor Safety Systems , 1988 .

[6]  Kwang-Hae Kim,et al.  Approaches to implementation of a repairable distributed recovery block scheme , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[7]  A. Avizienis,et al.  Dependable computing: From concepts to design diversity , 1986, Proceedings of the IEEE.

[8]  John E. Dobson,et al.  Building Reliable Secure Computing Systems Out Of Unreliable Insecure Components , 1986, 1986 IEEE Symposium on Security and Privacy.

[9]  Brian Randell Design Fault Tolerance , 1986 .

[10]  David F. McAllister,et al.  A large scale second generation experiment in multi-version software: description and early results , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[11]  Andy Hills,et al.  Fault tolerant avionics , 1988 .

[12]  Jean-Claude Laprie,et al.  Dependability Evaluation of Software Systems in Operation , 1984, IEEE Transactions on Software Engineering.

[13]  Algirdas Avižienis Fault-tolerance and fault-intolerance: Complementary approaches to reliable computing , 1975 .

[14]  Pamela Zave,et al.  Salient features of an executable specification language and its environment , 1986, IEEE Transactions on Software Engineering.

[15]  Algirdas Avizienis,et al.  A fault tolerance approach to computer viruses , 1988, Proceedings. 1988 IEEE Symposium on Security and Privacy.

[16]  Werner Schütz,et al.  DEDIX 87 — A Supervisory System for Design Diversity Experiments at UCLA , 1988 .

[17]  Peter A. Barrett,et al.  Tolerating Software Design Faults in a Command and Control System , 1988 .

[18]  Algirdas Avizienis,et al.  Fault-Tolerant Computing-Progress, Problems and Prospects , 1977, IFIP Congress.

[19]  Charles Babbage On the Mathematical Powers of the Calculating Engine , 1982 .

[20]  Doron Swade,et al.  Charles Babbage and His Calculating Engines , 1991 .

[21]  Algirdas Avizienis,et al.  Design of fault-tolerant computers , 1967, AFIPS '67 (Fall).

[22]  U. Voges Software Diversity in Computerized Control Systems , 1988, Dependable Computing and Fault-Tolerant Systems.

[23]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[24]  Santosh K. Shrivastava,et al.  Reliable computer systems : collected papers of the Newcastle Reliability Project , 1985 .

[25]  K.H. Kim,et al.  Testbed-based validation of design techniques for reliable distributed real-time systems , 1987, Proceedings of the IEEE.

[26]  Jean Arlat,et al.  ON THE PERFORMANCE OF SOFTWARE FAULT-TOLERANCE STRATEGIES+ , 1980 .

[27]  Tom Anderson A Structured Decision Mechanism for Diverse Software , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[28]  K. S. Tso,et al.  Error Recovery in Multi-Version Software , 1986 .

[29]  Dave E. Eckhardt,et al.  A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors , 1985, IEEE Transactions on Software Engineering.

[30]  Richard A. Kemmerer,et al.  Testing Formal Specifications to Detect Design Errors , 1985, IEEE Transactions on Software Engineering.

[31]  Pascal Traverse AIRBUS and ATR System Architecture and Specification , 1988 .

[32]  Algirdas Avizienis,et al.  Fault Tolerance by Design Diversity: Concepts and Experiments , 1984, Computer.

[33]  H. Hecht,et al.  Fault-Tolerant Software for Real-Time Applications , 1976, CSUR.

[34]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach to Uniform Treatment of Hardware and Software Faults , 1984, IEEE International Conference on Distributed Computing Systems.

[35]  Pamela Zave,et al.  An experiment in technology transfer: PAISLey specification of requirements for an undersea lightwave cable system , 1987, ICSE '87.

[36]  Jaynarayan H. Lala,et al.  Hardware and software fault tolerance: a unified architectural approach , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[37]  Algirdas Avižienis,et al.  The Evolution of Fault Tolerant Computing at the Jet Propulsion Laboratory and at UCLA: 1955 – 1986 , 1987 .

[38]  N. Ghani,et al.  A Recovery Cache for the PDP-11 , 1980, IEEE Transactions on Computers.

[39]  Hermann Kopetz,et al.  Software Redundancy in Real Time Systems , 1974, IFIP Congress.

[40]  Srinivas V. Makam,et al.  An Event-Synchronized System Architecture for Integrated Hardware and Software Fault-Tolerance , 1984, ICDCS.

[41]  C. V. Ramamoorthy,et al.  Application of a Methodology for the Development and Validation of Reliable Process Control Software , 1981, IEEE Transactions on Software Engineering.

[42]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[43]  Chris J. Walter,et al.  MAFT - An architecture for reliable fly-by-wire flight control , 1988 .

[44]  R. Kerr,et al.  Recovery blocks in action: A system supporting high reliability , 1976, ICSE '76.