Software Fault Tolerance: A Tutorial

Because of our present inability to produce error-free software, software fault tolerance is and will continue to be an important consideration in software systems. The root cause of software design errors is the complexity of the systems. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. After a brief overview of the software development processes, we note how hard-to-detect design faults are likely to be introduced during development and how software faults tend to be state-dependent and activated by particular input sequences. Although component reliability is an important quality measure for system level analysis, software reliability is hard to characterize and the use of post-verification reliability estimates remains a controversial issue. For some applications software safety is more important than reliability, and fault tolerance techniques used in those applications are aimed at preventing catastrophes. Single version software fault tolerance techniques discussed include system structuring and closure, atomic actions, inline fault detection, exception handling, and others. Multiversion techniques are based on the assumption that software built differently should fail differently and thus, if one of the redundant versions fails, it is expected that at least one of the other versions will provide an acceptable output. Recovery blocks, N-version programming, and other multiversion techniques are reviewed.

[1]  D. E. Eckhardt,et al.  An analysis of the effects of coincident errors on multi-version software , 1985 .

[2]  S. A. Doyle,et al.  Comparative analysis of two architectural alternatives for the N-version programming (NVP) system , 1995, Annual Reliability and Maintainability Symposium 1995 Proceedings.

[3]  Andy Hills,et al.  Fault tolerant avionics , 1988 .

[4]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[5]  I-Ling Yen Specialized N-modular redundant processors in large-scale distributed systems , 1996, Proceedings 15th Symposium on Reliable Distributed Systems.

[6]  Stuart Bennett,et al.  Experimental comparison of voting algorithms in cases of disagreement , 1997, EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167).

[7]  Stacy J. Prowell,et al.  Cleanroom software engineering: technology and process , 1999 .

[8]  J. N. Chelotti,et al.  A software fault tolerance experiment for space applications , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[9]  Michael R. Lyu,et al.  Software diversity metrics and measurements , 1992, [1992] Proceedings. The Sixteenth Annual International Computer Software and Applications Conference.

[10]  M.C. McElvany,et al.  Guaranteeing deadlines in MAFT , 1988, Proceedings. Real-Time Systems Symposium.

[11]  Peter J. Denning,et al.  Fault Tolerant Operating Systems , 1976, CSUR.

[12]  R. B. Broen New Voters for Redundant Systems , 1975 .

[13]  Thomas C. Bressoud,et al.  TFT: a software system for application-transparent fault tolerance , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[14]  Hoang Pham,et al.  Fault-Tolerant Software Systems: Techniques and Applications , 1992 .

[15]  A. K. Caglayan,et al.  Systems approach to software fault tolerance , 1985 .

[16]  Suku Nair,et al.  Software fault tolerance for distributed object based computing , 1997, J. Syst. Softw..

[17]  S S Brilliant,et al.  The consistent comparison problem in N-version software , 1987, SOEN.

[18]  K. H. Kim,et al.  Fault-tolerant real-time objects , 1997, CACM.

[19]  Behrooz Parhami Design of reliable software via general combination of N-version programming and acceptance testing , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[20]  Joseph Sifakis,et al.  Formal methods for the validation of fault tolerance in autonomous spacecraft , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[21]  J.H. Lala,et al.  A design approach for ultrareliable real-time systems , 1991, Computer.

[22]  Rami G. Melhem,et al.  Implementation of a transient-fault-tolerance scheme on DEOS-a technology transfer from an academic system to an industrial system , 1999, Proceedings of the Fifth IEEE Real-Time Technology and Applications Symposium.

[23]  T. Anderson,et al.  An Evaluation of Software Fault Tolerance in a Practical System , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[24]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..

[25]  Algirdas A. Avi The Methodology of N-Version Programming , 1995 .

[26]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[27]  C. A. Babikyan The fault tolerant parallel processor operating system concepts and performance measurement overview , 1990, 9th IEEE/AIAA/NASA Conference on Digital Avionics Systems.

[28]  Jie Wu,et al.  A uniform approach to software and hardware fault tolerance , 1994, J. Syst. Softw..

[29]  Bev Littlewood,et al.  Validation of ultrahigh dependability for software-based systems , 1993, CACM.

[30]  Gary McGraw,et al.  Software fault injection: inoculating programs against errors , 1997 .

[31]  Yann-Hang Lee,et al.  An integrated scheduling mechanism for fault-tolerant modular avionics systems , 1998, 1998 IEEE Aerospace Conference Proceedings (Cat. No.98TH8339).

[32]  C. Subramanian,et al.  Performance analysis of voting strategies for a fly-by-wire system of a fighter aircraft , 1989 .

[33]  Allen P. Nikora,et al.  Applying software reliability engineering in the 1990s , 1998 .

[34]  R.C. Taylor,et al.  A flexible fault tolerant processor for launch vehicle avionics systems , 1990, 9th IEEE/AIAA/NASA Conference on Digital Avionics Systems.

[35]  Ravishankar K. Iyer,et al.  Software Dependability in the Tandem GUARDIAN System , 1995, IEEE Trans. Software Eng..

[36]  Joe Marshall,et al.  Measuring robustness of a fault tolerant aerospace system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[37]  Rami G. Melhem,et al.  Reducing message overhead in TMR systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[38]  Francesca Saglietti Software diversity metrics quantifying dissimilarity in the input partition , 1990, Softw. Eng. J..

[39]  Kam S. Tso,et al.  Ada95 object-oriented and real-time support for development of software fault tolerance reusable components , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.

[40]  A. Avizienis Dependable computing depends on structured fault tolerance , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[41]  David F. McAllister,et al.  Fault-Tolerant SoFtware Reliability Modeling , 1987, IEEE Transactions on Software Engineering.

[42]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[43]  Hiroaki Takada,et al.  The multi-layered design diversity architecture: application of the design diversity approach to multiple system layers , 1992, Proceedings [1992] The Ninth TRON Project Symposium.

[44]  Pascal Traverse,et al.  AIRBUS A320/A330/A340 electrical flight controls - A family of fault-tolerant systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[45]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[46]  L. N. Simcox Software Fault Tolerance , 1988 .

[47]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[48]  Jean-Claude Laprie How Much is Safety Worth? , 1994, IFIP Congress.

[49]  Russ Abbott,et al.  Resourceful systems for fault tolerance, reliability, and safety , 1990, CSUR.

[50]  Jie Wu Software fault tolerance using hierarchical N-version programming , 1991, IEEE Proceedings of the SOUTHEASTCON '91.

[51]  J.L. Gersting,et al.  A comparison of voting algorithms for n-version programming , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[52]  Ken Sakamura,et al.  Design fault tolerance in operating systems based on a standardization project , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[53]  Michael R. Lyu,et al.  Software fault tolerance in a clustered architecture: techniques and reliability modeling , 1999, 1999 IEEE Aerospace Conference. Proceedings (Cat. No.99TH8403).

[54]  Algirdas Avizienis,et al.  A design paradigm for fault-tolerant systems , 1987 .

[55]  Gerald M. Masson,et al.  Using certification trails to achieve software fault tolerance , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[56]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[57]  H. Mori,et al.  Fault tolerant real-time operating system for 32 bit microprocessor V60/V70 , 1988 .

[58]  Philippe David,et al.  Development of a fault tolerant computer system for the HERMES space shuttle , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[59]  J. Xu,et al.  Toward an object-oriented approach to software fault tolerance , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.

[60]  Myron Hecht,et al.  Fault-tolerance in software , 1995 .

[61]  D. J. Taylor,et al.  A Compendium of Robust Data Structures , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[62]  J. Fairclough,et al.  Software engineering guides , 1996 .

[63]  Brian Randell,et al.  The Evolution of the Recovery Block Concept , 1994 .

[64]  Douglas M. Blough,et al.  A comparison of voting strategies for fault-tolerant distributed systems , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[65]  Mark Russinovich,et al.  Fault-tolerance for off-the-shelf applications and hardware , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[66]  Rachid Guerraoui,et al.  Software-Based Replication for Fault Tolerance , 1997, Computer.

[67]  Bev Littlewood,et al.  Predictably Dependable Computing Systems , 2012, ESPRIT Basic Research Series.

[68]  G. E. Migneault On requirements for software fault tolerance for flight controls , 1983 .

[69]  Daniel P. Siewiorek,et al.  Comparing operating systems using robustness benchmarks , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[70]  I. Lee,et al.  Measurement-based evaluation of operating system fault tolerance , 1993 .

[71]  Roy A. Maxion,et al.  Improving software robustness with dependability cases , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[72]  K. S. Tso,et al.  Error Recovery in Multi-Version Software , 1986 .

[73]  Francesca Saglietti,et al.  Software Diversity—Some Considerations About its Benefits and its Limitations , 1986 .

[74]  Barry W. Johnson,et al.  An operating system for a fault-tolerant multiprocessor controller , 1988, IEEE Micro.

[75]  Tippure S. Sundresh Software hardening-unifying software reliability strategies , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[76]  Chris J. Walter Evaluation and design of an ultra-reliable distributed architecture for fault tolerance , 1990 .

[77]  Chris J. Walter,et al.  The MAFT Architecture for Distributed Fault Tolerance , 1988, IEEE Trans. Computers.

[78]  R. M. Kieckhafer,et al.  Fault-tolerant real-time task scheduling in the MAFT distributed system , 1989, [1989] Proceedings of the Twenty-Second Annual Hawaii International Conference on System Sciences. Volume 1: Architecture Track.

[79]  G. B. Finelli,et al.  The Infeasibility of Quantifying the Reliability of Life-Critical Real-Time Software , 1993, IEEE Trans. Software Eng..

[80]  K. S. Tso,et al.  Development of software fault-tolerant applications with Ada95 object-oriented support , 1996, Proceedings of the IEEE 1996 National Aerospace and Electronics Conference NAECON 1996.

[81]  James M. Purtilo,et al.  A system for supporting multi-language versions for software fault tolerance , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[82]  Zary Segall,et al.  Is it possible to quantify the fault tolerance of distributed/parallel computer systems , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[83]  Santosh K. Shrivastava,et al.  Using objects and actions to provide fault tolerance in distributed, real-time applications , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[84]  Jean Arlat,et al.  Fault injection for the formal testing of fault tolerance , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[85]  S. Wicker Error Control Systems for Digital Communication and Storage , 1994 .

[86]  Avelino Francisco Zorzo,et al.  Rigorous development of a safety-critical system based on coordinated atomic actions , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[87]  Robbert van Renesse,et al.  Amoeba A Distributed Operating System for the 1990 s Sape , 1990 .

[88]  T. Anderson,et al.  Resilient computing systems: vol. 1 , 1986 .

[89]  Ming-Yee Lai,et al.  Software Fault Insertion Testing for Fault Tolerance , 1995 .

[90]  Gerald M. Masson,et al.  Certification of Computational Results , 1995, IEEE Trans. Computers.

[91]  Brian Randell,et al.  Software fault tolerance: t/(n-1)-variant programming , 1992 .

[92]  Nancy G Leveson,et al.  Software safety: why, what, and how , 1986, CSUR.

[93]  K. Echtle,et al.  Hardware and software fault tolerance using fail-silent virtual duplex systems , 1994, Proceedings of IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems.

[94]  Myron Hecht,et al.  Software reliability in the system context , 1986, IEEE Transactions on Software Engineering.

[95]  Peter J. Fleming,et al.  Dependable, intelligent voting for real-time control software , 1995 .

[96]  Dave E. Eckhardt,et al.  Fundamental differences in the reliability of N-modular redundancy and N-version programming , 1988, J. Syst. Softw..

[97]  LaprieJean-Claude,et al.  Definition and Analysis of Hardware- and Software-Fault-Tolerant Architectures , 1990 .

[98]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[99]  Daniel P. Siewiorek,et al.  Automated robustness testing of off-the-shelf software components , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[100]  Jan Torin,et al.  Dependable flight control system using data diversity with error recovery , 1994 .

[101]  Peter G. Bishop Software Fault Tolerance by Design Diversity , 1995 .

[102]  Timothy C. K. Chou Beyond Fault Tolerance , 1997, Computer.

[103]  Richard Y. Kain,et al.  Vote Assignments in Weighted Voting Mechanisms , 1991, IEEE Trans. Computers.

[104]  Peter Neumann,et al.  Safeware: System Safety and Computers , 1995, SOEN.

[105]  David F. McAllister,et al.  Cost modeling of N-version fault-tolerant software systems for large N , 1996, IEEE Trans. Reliab..

[106]  Ravishankar K. Iyer,et al.  Efficient service of rediscovered software problems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[107]  Gordon Mckinzie SUMMING UP THE 777'S FIRST YEAR: IS THIS A GREAT AIRPLANE, OR WHAT?. , 1996 .

[108]  Ravishankar K. Iyer,et al.  FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior under Faults , 1993, IEEE Trans. Software Eng..

[109]  Taesoon Park,et al.  Checkpointing and rollback-recovery in distributed systems , 1989 .

[110]  N. Leveson Software fault tolerance - The case for forward recovery , 1983 .

[111]  Jeffrey M. Voas,et al.  Faults on its sleeve: amplifying software reliability testing , 1993, ISSTA '93.

[112]  Timothy Fraser,et al.  Hardening COTS software with generic software wrappers , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[113]  Mark Russinovich,et al.  Application transparent fault management in fault tolerant Mach , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[114]  Peter M. Chen,et al.  How fail-stop are faulty programs? , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[115]  Dave E. Eckhardt,et al.  A theoretical investigation of generalized voters for redundant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[116]  Francesca Saglietti,et al.  Software Fault Tolerance: Achievement and Assessment Strategies , 1992 .

[117]  Thomas I. McVittie,et al.  Implementing design diversity to achieve fault tolerance , 1991, IEEE Software.

[118]  Paul Ammann,et al.  Design fault tolerance , 1991 .

[119]  Michael R. Lyu Software Fault Tolerance , 1995 .

[120]  Suku Nair,et al.  Application layer software fault tolerance for distributed object-oriented systems , 1995, Proceedings Nineteenth Annual International Computer Software and Applications Conference (COMPSAC'95).

[121]  Bev Littlewood,et al.  Conceptual Modeling of Coincident Failures in Multiversion Software , 1989, IEEE Trans. Software Eng..

[122]  M. Pitarys,et al.  Software technology for next-generation strike fighter avionics , 1996, 15th DASC. AIAA/IEEE Digital Avionics Systems Conference.

[123]  Algirdas Avizienis,et al.  Toward Systematic Design of Fault-Tolerant Systems , 1997, Computer.

[124]  P. K. Lala,et al.  On self-checking software design , 1991, IEEE Proceedings of the SOUTHEASTCON '91.

[125]  Francesca Saglietti Strategies for the Achievement and Assessment of Software Fault-Tolerance , 1990 .

[126]  Ravishankar K. Iyer,et al.  Experimental analysis of computer system dependability , 1996 .

[127]  Jeffrey M. Voas,et al.  Certifying Off-the-Shelf Software Components , 1998, Computer.

[128]  Stephen R. Schach,et al.  Testing: principles and practice , 1996, CSUR.

[129]  Nancy G. Leveson,et al.  Safeware: System Safety and Computers , 1995 .

[130]  Stephen S. Yau,et al.  Object-oriented software development with fault tolerance for distributed real-time systems , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.

[131]  Barry W. Johnson An introduction to the design and analysis of fault-tolerant systems , 1996 .

[132]  B. D. Aleksa,et al.  Boeing 777 airplane information management system operational experience , 1997, 16th DASC. AIAA/IEEE Digital Avionics Systems Conference. Reflections to the Future. Proceedings.

[133]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[134]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[135]  Karl-Erwin Großpietsch,et al.  Fault tolerance , 1994, IEEE Micro.

[136]  C. R. Spitzer All-digital jets are taking off: Aboard developmental commercial and military aircraft, digital electronics score high in compactness, control flexibility, and reliability , 1986, IEEE Spectrum.

[137]  A. Avizienis,et al.  Dependable computing: From concepts to design diversity , 1986, Proceedings of the IEEE.

[138]  K. S. Tso,et al.  Multi-Version Software Development , 1986 .

[139]  D. E. Eckhardt,et al.  A theoretical basis for the analysis of redundant software subject to coincident errors , 1985 .

[140]  Paul Ammann,et al.  Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[141]  Ravishankar K. Iyer,et al.  Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[142]  K Tso,et al.  A reuse framework for software fault tolerance , 1995 .

[143]  Robert Bleeg,et al.  Commercial jet transport fly-by-wire architecture considerations , 1988 .

[144]  Victor F. Nicola,et al.  Checkpointing and the modeling of program execution time , 1994 .

[145]  Hoyt Lougee,et al.  SOFTWARE CONSIDERATIONS IN AIRBORNE SYSTEMS AND EQUIPMENT CERTIFICATION , 2001 .

[146]  Algirdas Avizienis,et al.  Software Fault Tolerance , 1989, IFIP Congress.

[147]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[148]  Jaynarayan H. Lala,et al.  Hardware and software fault tolerance: a unified architectural approach , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[149]  Ganesha Beedubail,et al.  An algorithm for supporting fault tolerant objects in distributed object oriented operating systems , 1995, Proceedings of International Workshop on Object Orientation in Operating Systems.

[150]  Cecília M. F. Rubira,et al.  Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[151]  Philip Koopman,et al.  Comparing the robustness of POSIX operating systems , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[152]  Jean-Pierre Queille,et al.  Executable assertions and timed traces for on-line software error detection , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[153]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[154]  Neeraj Suri,et al.  Advances in ULTRA-Dependable Distributed Systems , 1994 .

[155]  Hirokazu Ihara,et al.  Dependable onboard computer systems with a new method-stepwise negotiating voting , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[156]  H. Kopetz,et al.  The transparent implementation of fault tolerance in the time-triggered architecture , 1999, Dependable Computing for Critical Applications 7.

[157]  David P. Gluch,et al.  A Perspective on the State of Research in Fault-Tolerant Systems. , 1997 .

[158]  Roger M. Kieckhafer Task Reconfiguration in a Distributed Real-Time System , 1987, RTSS.

[159]  Herbert Hecht Fault-Tolerant Software , 1979, IEEE Transactions on Reliability.

[160]  Jean Arlat,et al.  Architectural Issues in Software Fault Tolerance , 1995 .

[161]  Louise E. Moser,et al.  The Totem system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[162]  J. Terry Sims Redundancy Management Software Services for Seawolf Ship Control System , 1997, FTCS.

[163]  P. Lee Structuring software systems for fault tolerance , 1983 .

[164]  M. Deck Software reliability and the "Cleanroom" approach: a position paper , 1998, Annual Reliability and Maintainability Symposium. 1998 Proceedings. International Symposium on Product Quality and Integrity.

[165]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[166]  Gregory L. Greeley The effects of voting algorithms on N-version software reliability , 1987 .

[167]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[168]  I-Ling Yen An object-oriented fault-tolerance framework based on specialization techniques , 1997, Proceedings Third International Workshop on Object-Oriented Real-Time Dependable Systems.

[169]  J. T. Sims,et al.  The Byzantine Generals Problem , 1982, TOPL.

[170]  Yiu-Wing Leung,et al.  Processor Assignment and Execution Sequence for Multiversion Software , 1997, IEEE Trans. Computers.

[171]  David Lorge Parnas,et al.  Evaluation of safety-critical software , 1990, CACM.

[172]  James P. Black,et al.  Redundancy in Data Structures: Improving Software Fault Tolerance , 1980, IEEE Transactions on Software Engineering.

[173]  Ken Sakamura,et al.  MLDD (Multi Layered Design Diversity) Architecture for Achieving High Design Fault Tolerance Capabilities , 1994, EDCC.

[174]  J. E. Potter,et al.  Extension of the midvalue selection technique for redundancy management of inertial sensors , 1986 .

[175]  Y. C. Yeh,et al.  Triple-triple redundant 777 primary flight computer , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[176]  Jean-Claude Laprie,et al.  Saturation: reduced idleness for improved fault-tolerance , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[177]  C. M. Krishna,et al.  Trade-offs in developing fault tolerant software , 1993 .

[178]  David J. Taylor,et al.  Redundancy in Data Structures: Some Theoretical Results , 1980, IEEE Transactions on Software Engineering.

[179]  Edward J. McCluskey,et al.  Executable assertions and flight software , 1984 .

[180]  Ytzhak H. Levendel,et al.  Defects and reliability analysis of large software systems: field experience , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[181]  T. W. Anderson,et al.  Resilient Computing Systems , 1987 .

[182]  J. Arlat,et al.  Assessment of COTS microkernels by fault injection , 1999, Dependable Computing for Critical Applications 7.

[183]  Anthony S. Wojcik,et al.  An Application of Formal Analysis to Software in a Fault-Tolerant Environment , 1999, IEEE Trans. Computers.

[184]  P. M. Melliar-Smith Development of software fault-tolerance techniques , 1983 .

[185]  David F. McAllister,et al.  Reliability of voting in fault-tolerant software systems for small output-spaces , 1990 .

[186]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[187]  Francesca Saglietti The Impact of Voter Granularity in Fault-Tolerant Software on System Reliability and Avaiability , 1989 .

[188]  Bev Littlewood The impact of diversity upon common mode failures , 1996 .

[189]  Nancy G. Leveson,et al.  An Empirical Comparison of Software Fault Tolerance and Fault Elimination , 1991, IEEE Trans. Software Eng..

[190]  Andrew S. Tanenbaum,et al.  The Amoeba Distributed Operating System , 1992 .

[191]  J. H. Lala,et al.  Architectural principles for safety-critical real-time applications , 1994, Proc. IEEE.

[192]  Michael G. Daughan SEAWOLF SUBMARINE SHIP CONTROL SYSTEM : A CASE STUDY OF A FAULT- TOLERANT DESIGN , 1994 .

[193]  Santosh K. Shrivastava,et al.  Replication within atomic actions and conversations: a case study in fault-tolerance duality , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[194]  Cnrs-Laas Digest of papers : the Twenty-Third International Symposium on Fault-Tolerant Computing : FTCS 23, June 22-24, 1993, Toulouse, France , 1993 .

[195]  Jaynarayan H. Lala,et al.  Reducing the probability of common-mode failure in the fault tolerant parallel processor , 1993, [1993 Proceedings] AIAA/IEEE Digital Avionics Systems Conference.

[196]  Pascal Traverse Dependability of Digital Computers on Board Airplanes , 1991 .