Evaluation and comparison of fault-tolerant software techniques

Four implementations of fault-tolerant software techniques are evaluated with respect to hardware and design faults. Project participants were divided into four groups, each of which developed fault-tolerant software based on a common specification. Each group applied one of the following techniques: N-version programming, recovery block, concurrent error-detection, and algorithm-based fault tolerance. Independent testing and modeling groups analyzed the software. The testing group subjected it to simulated design and hardware faults. The data were then mapped into a discrete-time Markov model developed by the modeling group. The effectiveness of each technique with respect to availability, correctness, and time to failure given an error, as shown by the model, is contrasted with measured data. The model is analyzed with respect to additional figures of merit identified during the modeling process, and the techniques are ranked using an application taxonomy. >

[1]  D. P. Siemwiorek Architecture of fault-tolerant computers: an historical perspective , 1991 .

[2]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[3]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..

[4]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[5]  Kishor S. Trivedi,et al.  Reliability Modeling Using SHARPE , 1987, IEEE Transactions on Reliability.

[6]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[7]  Peter A. Barrett,et al.  Software Fault Tolerance: An Evaluation , 1985, IEEE Transactions on Software Engineering.

[8]  Kishor S. Trivedi,et al.  Reliability estimation of fault-tolerant systems: tools and techniques , 1990, Computer.

[9]  James P. Black,et al.  Redundancy in Data Structures: Improving Software Fault Tolerance , 1980, IEEE Transactions on Software Engineering.

[10]  David J. Taylor,et al.  Redundancy in Data Structures: Some Theoretical Results , 1980, IEEE Transactions on Software Engineering.

[11]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[12]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[13]  Brian Randell System structure for software fault tolerance , 1975 .

[14]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.