Software Fault Tolerance Techniques and Implementation

Software Fault Tolerance Techniques and Implementation examines key programming techniques such as assertions, checkpointing, and atomic actions, and provides design tips and models to assist in the development of critical fault tolerant software that helps ensure dependable performance. From software reliability, recovery, and redundancy, to design and data diverse software fault tolerance techniques, this practical reference provides detailed insight into techniques that can improve the overall dependability of software.

[1]  William H. Pierce Adaptive Decision Elements , 1965 .

[2]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[3]  S. Yau,et al.  Design of self-checking software , 1975, Reliable Software.

[4]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[5]  R. B. Broen New Voters for Redundant Systems , 1975 .

[6]  Stephen S. Yau,et al.  An approach to error-resistant software design , 1976, ICSE '76.

[7]  H. Hecht,et al.  Fault-Tolerant Software for Real-Time Applications , 1976, CSUR.

[8]  Herbert Hecht Fault-Tolerant Software , 1979, IEEE Transactions on Reliability.

[9]  Jean Arlat,et al.  ON THE PERFORMANCE OF SOFTWARE FAULT-TOLERANCE STRATEGIES+ , 1980 .

[10]  James P. Black,et al.  Redundancy in Data Structures: Improving Software Fault Tolerance , 1980, IEEE Transactions on Software Engineering.

[11]  David J. Taylor,et al.  Redundancy in Data Structures: Some Theoretical Results , 1980, IEEE Transactions on Software Engineering.

[12]  K. H. Kim,et al.  Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.

[13]  H. Hecht,et al.  Fault tolerant software modules for SIFT , 1982 .

[14]  James P. Black,et al.  Principles of Data Structure Error Correction , 1982, IEEE Transactions on Computers.

[15]  John C. Knight,et al.  A Framework for Software Fault Tolerance in Real-Time Systems , 1983, IEEE Transactions on Software Engineering.

[16]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach to Uniform Treatment of Hardware and Software Faults , 1984, IEEE International Conference on Distributed Computing Systems.

[17]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[18]  Algirdas Avizienis,et al.  Fault Tolerance by Design Diversity: Concepts and Experiments , 1984, Computer.

[19]  Srinivas V. Makam,et al.  An Event-Synchronized System Architecture for Integrated Hardware and Software Fault-Tolerance , 1984, ICDCS.

[20]  Jean-Claude Laprie,et al.  Dependability Evaluation of Software Systems in Operation , 1984, IEEE Transactions on Software Engineering.

[21]  Brian Randell Fault Tolerance and System Structuring , 1984 .

[22]  Kang G. Shin,et al.  Evaluation of Error Recovery Blocks Used for Cooperating Processes , 1984, IEEE Transactions on Software Engineering.

[23]  Lorenzo Strigini,et al.  Software Fault-Tolerance and Design Diversity: Past Experience and Future Evolution , 1985 .

[24]  Peter A. Barrett,et al.  Software Fault Tolerance: An Evaluation , 1985, IEEE Transactions on Software Engineering.

[25]  Farokh B. Bastani,et al.  ANALYSIS OF AN INHERENTLY FAULT TOLERANT PROGRAM. , 1985 .

[26]  M. Mulazzani Reliability Versus Safety , 1985 .

[27]  Dave E. Eckhardt,et al.  A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors , 1985, IEEE Transactions on Software Engineering.

[28]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[29]  Peter G. Bishop,et al.  PODS — A project on diverse software , 1986, IEEE Transactions on Software Engineering.

[30]  Peter G. Neumann,et al.  On hierarchical design of computer systems for critical applications , 1986, IEEE Transactions on Software Engineering.

[31]  Sung D. Cha A Recovery Block Model and Its Analysis , 1986 .

[32]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[33]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[34]  Tom Anderson A Structured Decision Mechanism for Diverse Software , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[35]  K. S. Tso,et al.  Error Recovery in Multi-Version Software , 1986 .

[36]  David F. McAllister,et al.  Fault-Tolerant SoFtware Reliability Modeling , 1987, IEEE Transactions on Software Engineering.

[37]  George E. Stark Dependability Evaluation of Integrated Hardware/Software Systems , 1987, IEEE Transactions on Reliability.

[38]  Russ Abbott Resourceful systems and software fault tolerance , 1988, IEA/AIE '88.

[39]  Jean Arlat,et al.  Dependability evaluation of software fault-tolerance , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[40]  Michael R. Lyu A design paradigm for multi-version software , 1988 .

[41]  Jaynarayan H. Lala,et al.  Hardware and software fault tolerance: a unified architectural approach , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[42]  Richard Y. Kain,et al.  Vote assignments in weighted voting mechanisms , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[43]  Kwang-Hae Kim,et al.  Approaches to implementation of a repairable distributed recovery block scheme , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[44]  K. H. Kim,et al.  Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation , 1988, IEEE Trans. Software Eng..

[45]  Pascal Traverse AIRBUS and ATR System Architecture and Specification , 1988 .

[46]  Paul Ammann,et al.  Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[47]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[48]  K. H. Kim,et al.  Performance Impacts of Look-Ahead Execution in the Conversation Scheme , 1989, IEEE Trans. Computers.

[49]  Paul Ammann,et al.  Issues Influencing the Use of N-Version Programming , 1989, IFIP Congress.

[50]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications , 1989, IEEE Trans. Computers.

[51]  P.-N. Lee,et al.  Concurrent correspondent modules: a fault tolerant Ada implementation , 1989, Eighth Annual International Phoenix Conference on Computers and Communications. 1989 Conference Proceedings.

[52]  Bev Littlewood,et al.  Conceptual Modeling of Coincident Failures in Multiversion Software , 1989, IEEE Trans. Software Eng..

[53]  Hirokazu Ihara,et al.  Dependable onboard computer systems with a new method-stepwise negotiating voting , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[54]  K. H. Kim,et al.  Performance analysis of fault-tolerant systems in parallel execution of conversations , 1989 .

[55]  Dave E. Eckhardt,et al.  A theoretical investigation of generalized voters for redundant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[56]  Behrooz Parhami A new paradigm for the design of dependable systems , 1989, IEEE International Symposium on Circuits and Systems,.

[57]  Francesca Saglietti The Impact of Voter Granularity in Fault-Tolerant Software on System Reliability and Avaiability , 1989 .

[58]  Piotr Jędrzejowicz,et al.  Fault-tolerant programs and their reliability , 1990 .

[59]  Jean Arlat,et al.  Dependability Modeling and Evaluation of Software Fault-Tolerant Systems , 1990, IEEE Trans. Computers.

[60]  David F. McAllister,et al.  Reliability of voting in fault-tolerant software systems for small output-spaces , 1990 .

[61]  Douglas M. Blough,et al.  A comparison of voting strategies for fault-tolerant distributed systems , 1990, Proceedings Ninth Symposium on Reliable Distributed Systems.

[62]  Victor F. Nicola,et al.  Modeling of Correlated Failures and Community Error Recovery in Multiversion Software , 1990, IEEE Trans. Software Eng..

[63]  Russ Abbott,et al.  Resourceful systems for fault tolerance, reliability, and safety , 1990, CSUR.

[64]  Gerald M. Masson,et al.  Using certification trails to achieve software fault tolerance , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[65]  Geppino Pucci On the modelling and testing of recovery block structures , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[66]  John P. J. Kelly,et al.  Achieving Dependability Throughout the Development Process: A Distributed Software Experiment , 1990, IEEE Trans. Software Eng..

[67]  P.-N. Lee,et al.  Correspondent computing for software implementation fault tolerance , 1990, Proceedings of the 1990 Symposium on Applied Computing.

[68]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[69]  Jean-Claude Laprie,et al.  The transformation approach to the modeling and evaluation of the reliability and availability growth , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[70]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..

[71]  Fevzi Belli,et al.  Comparative analysis of concurrent fault tolerance techniques for real-time applications , 1991, Proceedings. 1991 International Symposium on Software Reliability Engineering.

[72]  Attila Csenki,et al.  Recovery Block Reliability Analysis with Failure Clustering , 1991 .

[73]  K. H. Kim,et al.  A distributed fault tolerant architecture for nuclear reactor and other critical process control applications , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[74]  Thomas I. McVittie,et al.  Implementing design diversity to achieve fault tolerance , 1991, IEEE Software.

[75]  M. Hecht,et al.  A new low cost distributed fault tolerant architecture for process control applications , 1991, IEEE Proceedings of the SOUTHEASTCON '91.

[76]  Jie Xu The t(n-1)-diagnosability and its applications to fault tolerance , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[77]  J.L. Gersting,et al.  A comparison of voting algorithms for n-version programming , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[78]  Behrooz Parhami A Data-Driven Dependability Assurance Scheme with Applications to Data and Design Diversity , 1991 .

[79]  D. McAllister,et al.  Cost modelling of fault-tolerant software , 1991 .

[80]  A. L. Goel,et al.  Software engineering for fault-tolerant systems. Final technical report, Jan 89-Aug 90 , 1991 .

[81]  Nancy G. Leveson,et al.  An Empirical Comparison of Software Fault Tolerance and Fault Elimination , 1991, IEEE Trans. Software Eng..

[82]  John Cowles,et al.  Evaluation of combined approaches to distributed software-based fault tolerance , 1991, [1991] Proceedings Pacific Rim International Symposium on Fault Tolerant Systems.

[83]  Gerald M. Masson,et al.  Certification trails for data structures , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[84]  Jie Wu Software fault tolerance using hierarchical N-version programming , 1991, IEEE Proceedings of the SOUTHEASTCON '91.

[85]  Jean-Claude Laprie,et al.  X-Ware Reliability and Availability Modeling , 1992, IEEE Trans. Software Eng..

[86]  Kishor S. Trivedi,et al.  Analyzing Concurrent and Fault-Tolerant Software Using Stochastic Reward Nets , 1992, J. Parallel Distributed Comput..

[87]  P. E. Ammann,et al.  Data Redundancy for the Detection and Tolerance of Software Faults , 1992 .

[88]  Geppino Pucci,et al.  A New Approach to the Modeling of Recovery Block Structures , 1992, IEEE Trans. Software Eng..

[89]  Francesca Saglietti,et al.  Software Fault Tolerance: Achievement and Assessment Strategies , 1992 .

[90]  Brian Randell,et al.  Software fault tolerance: t/(n-1)-variant programming , 1992 .

[91]  Jean Arlat,et al.  Reliability growth of fault-tolerant software , 1993 .

[92]  Michael R. Lyu,et al.  Improving the N-version programming process through the evolution of a design paradigm , 1993 .

[93]  L. L. Pullum,et al.  A new adjudicator for fault tolerant software applications correctly resulting in multiple solutions , 1993, [1993 Proceedings] AIAA/IEEE Digital Avionics Systems Conference.

[94]  Kishor S. Trivedi,et al.  Modeling Correlation in Software Recovery Blocks , 1993, IEEE Trans. Software Eng..

[95]  Salvatore J. Bavuso,et al.  Fault trees and Markov models for reliability analysis of fault-tolerant digital systems , 1993 .

[96]  Ann T. Tai,et al.  Evaluation of Fault-Tolerant Software: A Performability Modeling Approach , 1993 .

[97]  K. H. Kim Structuring DRB computing stations in highly decentralized LAN systems , 1993, Proceedings ISAD 93: International Symposium on Autonomous Decentralized Systems.

[98]  Andrea Bondavalli,et al.  A Cost-Effective and Flexible Scheme for Software fault Tolerance , 1993 .

[99]  Douglas M. Blough,et al.  Voting using predispositions , 1994 .

[100]  Kishor S. Trivedi,et al.  Analyses Using Stochastic Reward Nets , 1995 .

[101]  Yiu-Wing Leung,et al.  Maximum likelihood voting for fault-tolerant software with finite output-space , 1995 .

[102]  Jie Xu,et al.  Dynamic Adjustment of Dependability and Efficiency in Fault-Tolerant Software , 1995 .

[103]  Michael R. Lyu,et al.  Dependability Modeling for Fault-Tolerant Software and Systems , 1995 .

[104]  Lorenzo Strigini,et al.  Dependability Analysis of Iterative Fault-Tolerant Software Considering Correlation , 1995 .

[105]  Myron Hecht,et al.  Fault-tolerance in software , 1995 .

[106]  David F. McAllister,et al.  An empirical evaluation of maximum likelihood voting in failure correlation conditions , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[107]  Dhiraj K. Pradhan,et al.  Fault-tolerant computer system design , 1996 .

[108]  David F. McAllister,et al.  Fault-tolerant software reliability engineering , 1996 .

[109]  Stuart Bennett,et al.  Experimental comparison of voting algorithms in cases of disagreement , 1997, EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167).

[110]  D. McAllister,et al.  Fault-tolerant software voters based on fuzzy equivalence relations , 1998, 1998 IEEE Aerospace Conference Proceedings (Cat. No.98TH8339).

[111]  Laura L. Pullum,et al.  Object-oriented executives and components for fault tolerance , 2001, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542).

[112]  K. Kane The Distributed Recovery Block Scheme , 2022 .