Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness.

[1]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[2]  J. Goldberg Logical design techniques for error control. , 1966 .

[3]  Karl N. Levitt,et al.  An Organization for a Highly Survivable Memory , 1974, IEEE Transactions on Computers.

[4]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[5]  Robert E. Shostak,et al.  Deciding Combinations of Theories , 1982, JACM.

[6]  Leslie Lamport,et al.  The Implementation of Reliable Distributed Multiprocess Systems , 1978, Comput. Networks.

[7]  Ashley W. Goldsworthy 8th World Computer Congress , 1978, Aust. Comput. J..

[8]  A.L. Hopkins,et al.  FTMP—A highly reliable fault-tolerant multiprocess for aircraft , 1978, Proceedings of the IEEE.

[9]  Jack Goldberg,et al.  SIFT: A Provable Fault-Tolerant Computer for Aircraft Flight Control , 1980, IFIP Congress.

[10]  P. M. Melliar-Smith,et al.  STP: A Mechanized Logic for Specification and Verification , 1982, CADE.

[11]  Albert L. Hopkins A Fault-Tolerant Information Processing Concept for Space Vehicles , 1971, IEEE Transactions on Computers.

[12]  Karl N. Levitt,et al.  The design, analysis, and verification of the SIFT fault tolerant system , 1976, ICSE '76.

[13]  Leslie Lamport,et al.  Using Time Instead of Timeout for Fault-Tolerant Distributed Systems. , 1984, TOPL.

[14]  Lawrence Robinson,et al.  The SRI Hierarchical Development Methodology (HDM) and its Application to the Development of Secure Software | NIST , 1980 .

[15]  John H. Wensley SIFT: software implemented fault tolerance , 1972, AFIPS '72 (Fall, part I).

[16]  J. Goldberg,et al.  SIFT: Design and analysis of a fault-tolerant computer for aircraft control , 1978, Proceedings of the IEEE.

[17]  Danny Dolev,et al.  The Byzantine Generals Strike Again , 1981, J. Algorithms.