Choosing Between Fault-Tolerance and Increased V&V for Improving Reliability

Fault tolerant systems based on the use of software design diversity may be able to achieve high levels of reliability more cost-effectively than other approaches, such as heroic debugging. Earlier experiments have shown that multi-version software systems are more reliable than the individual versions. However, it is also clear that the reliability benefits are much worse than would be suggested by naive assumptions of failure independence between the versions. To decide whether to use design diversity or other means for achieving the desired reliability a developer would need to know how they compare from the viewpoint of cost-effectiveness. Empirical data are insufficient for deciding this question, and expert opinions differ. We refute a recently published argument in favour of diversity and in the process show some general factors deciding whether process improvement, or debugging of the versions in a multiple-version system, will increase or decrease the statistical correlation between failures of the versions. The conclusion is that there is as yet no evidence that the choice between design diversity and other means of reliability improvement can be decided by general arguments rather than by detailed (and uncertain) special-case analysis.

[1]  Peter T. Popov,et al.  The effect of testing on the reliability of single version and 1-out-of-2 software systems , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[2]  Heinz Kantz,et al.  The ELEKTRA railway signalling system: field experience with an actively replicated system with diversity , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  Bev Littlewood,et al.  N-version design Versus one Good Version , 2000 .

[4]  Pascal Traverse,et al.  AIRBUS A320/A330/A340 electrical flight controls - A family of fault-tolerant systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[5]  Bev Littlewood,et al.  Modelling the effects of combining diverse software fault removal techniques , 1999 .

[6]  U. Voges Software Diversity in Computerized Control Systems , 1988, Dependable Computing and Fault-Tolerant Systems.

[7]  Bev Littlewood,et al.  Modeling the Effects of Combining Diverse Software Fault Detection Techniques , 2000, IEEE Trans. Software Eng..

[8]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[9]  Bev Littlewood,et al.  Conceptual Modeling of Coincident Failures in Multiversion Software , 1989, IEEE Trans. Software Eng..

[10]  Dave E. Eckhardt,et al.  A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors , 1985, IEEE Transactions on Software Engineering.

[11]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..