Fault tolerance, principles and practice

1 Introduction.- Fault Prevention and Fault Tolerance.- Anticipated and Unanticipated Faults.- Book Aim.- References.- 2 System Structure and Dependability.- System Structure.- Systems.- System Model.- Software/Hardware Interaction.- Interpreter Model of Systems.- Component Model of Systems.- Measures and Mechanisms.- Atomic Actions.- System Dependability and Reliability.- Dependability.- Failure and Reliability.- System Specification.- Multiple Specifications.- Erroneous Transitions and States.- Component/Design Failures.- Errors and Faults.- Fault Classifications.- Summary.- References.- 3 Fault Tolerance.- Fault Tolerance: How.- Principles of Fault Tolerance.- Redundancy.- Fault Tolerance: Where and How Much.- Quantitative Reliability Evaluation.- Hardware Reliability Models.- Software Reliability Models.- An Implementation Framework.- Exceptions and Exception Handling.- Classification of Exceptions.- Exception Handling in Software Systems.- Exception Propagation.- Summary of Exception Handling.- References.- 4 Fault Tolerant Systems.- ESS No. lA.- System Description.- Reliability Strategies.- SIFT and Ftmp.- SIFT System Design.- SIFT Reliability Strategies.- FTMP System Design.- FTMP Reliability Strategies.- Tandem.- Tandem Reliability Strategies.- Stratus.- Stratus Reliability Strategies.- References.- 5 Error Detection.- Measures for Error Detection.- Ideal Checks.- Types of Check.- Replication Checks.- Timing Checks.- Reversal Checks.- Coding Checks.- Reasonableness Checks.- Structural Checks.- Diagnostic Checks.- Mechanisms for Error Detection.- Structuring Error Detection in Systems.- References.- 6 Damage Confinement and Assessment.- Damage Confinement.- Measures for Damage Confinement.- Measures for Damage Assessment.- Mechanisms for Damage Confinement.- Protection Mechanisms.- Mechanisms for Damage Assessment.- Summary.- References.- 7 Error Recovery.- Concepts of Error Recovery.- State Restoration.- Forward and Backward Error Recovery.- Measures for Forward Error Recovery.- Backward Error Recovery.- Facilities for Backward Error Recovery.- Measures For Backward Error Recovery.- Mechanisms For Backward Error Recovery.- Checkpoints and Audit Trails.- The Recovery Cache.- Unrecoverable Components.- Recovery in Hierarchical Systems.- Recovery in Concurrent Systems.- Concurrent Processes.- Recovery for Competing Processes.- Recovery for Cooperating Processes.- Distributed Systems.- Recovery in Idealised Fault Tolerant Components.- Summary.- References.- 8 Fault Treatment and Continued Service.- Fault Location.- System Repair.- Resuming Normal Service.- Idealised Fault Tolerant Components.- Summary.- References.- 9 Software Fault Tolerance.- The Recovery Block Scheme.- Implementation of Recovery Blocks.- The Utility of Recovery Blocks.- Acceptance Tests.- Run-Time Overheads.- Experiments With Recovery Blocks.- Summary of Recovery Blocks.- The N-Version Programming Scheme.- Implementation of N-Version Programming.- Voting Check.- Experiments With N-Version Programming.- Summary of N-Version Programming.- Comparison with the Recovery Block Scheme.- Summary.- References.- 10 Conclusion.- Methodology and Framework for Fault Tolerance.- Idealised Fault Tolerant Components.- Failure Exceptions.- Critical Components.- The Future.- References.- References.- Annotated Bibliography.- Multiple Sources.- Fault Tolerant Systems.- August Systems.- COMTRAC.- COPRA.- C.vmp.- ESS Systems (Bell Laboratories).- Fault Tolerant Multiprocessor (FTMP).- Fault Tolerant Spaceborne Computer (FTSC).- IBM 9020.- JPL-STAR Computer.- MARS.- Plessey System 250.- Pluribus.- PRIME.- Sequoia.- Software Implemented Fault Tolerance (SIFT).- Space Shuttle Computer Complex.- Stratus.- Tandem.- VOTRICS.- Software Fault Tolerance.- Multiple Source.- Recovery Blocks.- N-Version Programming.- Other Software Fault Tolerance Papers.- Exception Handling.