Fault-tolerant computing: fundamental concepts

The basic concepts of fault-tolerant computing are reviewed, focusing on hardware. Failures, faults, and errors in digital systems are examined, and measures of dependability, which dictate and evaluate fault-tolerance strategies for different classes of applications, are defined. The elements of fault-tolerance strategies are identified, and various strategies are reviewed. They are: error detection, masking, and correction; error detection and correction codes; self-checking logic; module replication for error detection and masking; protocol and timing checks; fault containment; reconfiguration and repair; and system recovery.<<ETX>>

[1]  Kishor S. Trivedi,et al.  Reliability estimation of fault-tolerant systems: tools and techniques , 1990, Computer.

[2]  Sudhakar M. Reddy,et al.  Fault-Tolerance Considerations in Large, Multiple-Processor Systems , 1986, Computer.

[3]  Joel R. Sklaroff,et al.  Redundancy Management Technique for Space Shuttle Computers , 1976, IBM J. Res. Dev..

[4]  S TrivediKishor,et al.  Reliability Estimation of Fault-Tolerant Systems , 1990 .

[5]  J.A. Abraham,et al.  Fault and error models for VLSI , 1986, Proceedings of the IEEE.

[6]  A.L. Hopkins,et al.  FTMP—A highly reliable fault-tolerant multiprocess for aircraft , 1978, Proceedings of the IEEE.

[7]  Walter H. Kohler,et al.  A Survey of Techniques for Synchronization and Recovery in Decentralized Computer Systems , 1981, CSUR.

[8]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[9]  A. Avizienis,et al.  Dependable computing: From concepts to design diversity , 1986, Proceedings of the IEEE.

[10]  W.N. Toy,et al.  Fault-tolerant design of local ESS processors , 1978, Proceedings of the IEEE.

[11]  Mariagiovanna Sami,et al.  Fault Tolerance Techniques for Array Structures Used in Supercomputing , 1986, Computer.

[12]  Dave Johnson,et al.  The Intel 432: A VLSI Architecture for Fault-Tolerant Computer Systems , 1984, Computer.

[13]  Edward J. McCluskey,et al.  Design techniques for testable embedded error checkers , 1990, Computer.