Software fault tolerance has primarily been aimed at increasing total software reliability. Unfortunately, it is impossible to provide general techniques that tolerate all faults with a very high confidence rate. This paper presents some of the available experimental evidence. However, in some situations a more limited fault tolerance may be all that is needed, i.e., the program must be able to prevent unsafe states (but not necessarily all incorrect states) or detect them and recover to a safe (but not necessarily correct) state. This approach is application-specific; the particular fault-tolerance facilities are designed specifically for the particular application. This paper briefly describes how this can be accomplished. Although more specific analysis of the problem is required for this approach than the more general ones, it provides the advantage of partial verification of the adequacy of the fault tolerance used (e.g., it is possible to show that certain hazardous states cannot be caused by software faults) and therefore will aid in certifying and licensing software that can potentially have catastrophic consequences. That is, the approach provides greater confidence about a more limited goal than more general approaches. These techniques can also be used to tailor more general fault-tolerance techniques, such as recovery blocks, and to aid in writing acceptance tests that will ensure safety. Even with the use of these techniques, systems with very low acceptable risk may not be able to be built using software components.
[1]
Dorothy M. Andrews,et al.
An automated program testing methodology and its implementation
,
1981,
ICSE '81.
[2]
Brian Randell.
System structure for software fault tolerance
,
1975
.
[3]
Ken Thompson,et al.
Reflections on trusting trust
,
1984,
CACM.
[4]
Nancy G. Leveson,et al.
An experimental evaluation of the assumption of independence in multiversion programming
,
1986,
IEEE Transactions on Software Engineering.
[5]
Larry J. Yount,et al.
Fault effect protection and partitioning for fly-by-wire/fly-by-light avionics systems
,
1985
.
[6]
Sungdeok Cha.
An Empirical study of Software Error Detection using Self-Checks
,
1987
.
[7]
Nancy G. Leveson,et al.
Analyzing Software Safety
,
1983,
IEEE Transactions on Software Engineering.
[8]
Nancy G. Leveson,et al.
Safety Analysis Using Petri Nets
,
1987,
IEEE Transactions on Software Engineering.
[9]
S S Brilliant,et al.
The consistent comparison problem in N-version software
,
1987,
SOEN.
[10]
Ed Joyce,et al.
Software bugs: a matter of life and liability
,
1987
.
[11]
Nancy G Leveson,et al.
Software safety: why, what, and how
,
1986,
CSUR.
[12]
David F. McAllister,et al.
Fault-Tolerant SoFtware Reliability Modeling
,
1987,
IEEE Transactions on Software Engineering.