An Evaluation of Software Fault Tolerance in a Practical System

An experimental project to assess the effectiveness of software fault tolerance techniques is described. Techniques were developed for, and applied to, a realistic implementation of a practical real-time system, namely a naval command and control system. Reliability data was collected by running this system with a simulated tactical environment for a variety of action scenarios. Analysis of the data confirms that software fault tolerance techniques can significantly enhance system reliability.

[1]  David F. McAllister,et al.  Modeling Fault-Tolerant Software Reliability , 1983, Symposium on Reliability in Distributed Software and Database Systems.

[2]  O B Linde COMPUTERS CAN NOW PERFORM VITAL FUNCTIONS SAFELY , 1979 .

[3]  C. Wild,et al.  Study of fault-tolerant software technology , 1984 .

[4]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[5]  N. Ghani,et al.  A Recovery Cache for the PDP-11 , 1980, IEEE Transactions on Computers.

[6]  David F. McAllister,et al.  Fault-Tolerant SoFtware Reliability Modeling , 1987, IEEE Transactions on Software Engineering.

[7]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[8]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[9]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.