POSITION S TATEMENT For the past two decades, various software fault-tolerance (IT) schemes have been proposed, e.g., N-Version Programming, Self-checking and Recovery Block schemes among others. Yet, few real systems have incorporated software fault-tolerance schemes in practice. The reluctance to use software fault-tolerance schemes stems from some formidable sources: (a) inherent complexity and development risk, (b) high cost of HW lk SW redundancy, (c) realization of acceptability test logic (including the overhead imposed on performancej, and (d) lack of trustworthy evaluation methods for determining system reliability. We seek to better understand the source and mechanism of software failures, and to identify the software fault-tolerance mechanism most appropriate for a articular class of failures. We are attempting to relate the failure behavior of software to the formal specification of the software system at higher levels. RELATED WORK: RELIABILITY GROWTH TESTING Current approaches utilize reliability growth testing which is highly dependent on the predictive validity of the model, test coverage and operational profile. These approaches often employ goodness of fit and recalibration techniques to enable the user to gauge how well the model is working. Software reliability can be predicted based on measurable characteristics of the software development process and artifacts. A program's failure rate is related to the fault hazard rate profile. Unfortunately, the hazard rate profile is usually determined by "fault seeding" or by retrospective failure analysis. Under a particular operational profile, provided the same information (i.e., frequency with which potential faults are encountered) can be provided by adding randomly placed counters within the code. SYSTEMS LEVEL: INHERENT RELIABILITY OF A SW FT DESIGN CANDIDATE Thee different result spaces are possible from software fault-tolerance (see Figure 1): 1) Intended or correct results, shown by the horizontally oriented oval, which fulfill the intention of the user a d are defined by system requirements, 2) Actuai results, those produced by the system (Ova at 4s0), and 3) Accepted results. those admitted by the error detection module as being tolerable (vertical ovai). The relationshp between these three result sets make possible four state categones (see tree structure): i) No error: actual result is correct and accepted. ii) False alarm: X t U a l result is correct but not accepted, iii) Missing alarm: actual result is not correct but accepted, and iv) Detected error: actual result is not correct and not accepted.
[1]
Victor F. Nicola,et al.
Modeling of Correlated Failures and Community Error Recovery in Multiversion Software
,
1990,
IEEE Trans. Software Eng..
[2]
Salvatore J. Bavuso,et al.
Fault trees and Markov models for reliability analysis of fault-tolerant digital systems
,
1993
.
[3]
Geppino Pucci,et al.
A New Approach to the Modeling of Recovery Block Structures
,
1992,
IEEE Trans. Software Eng..
[4]
K. H. Kim,et al.
Programmer-Transparent Coordination of Recovering Concurrent Processes: Philosophy and Rules for Efficient Implementation
,
1988,
IEEE Trans. Software Eng..
[5]
Parameswaran Ramanathan,et al.
Checkpointing and rollback recovery in a distributed system using common time base
,
1988,
Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.
[6]
Richard G. Hamlet.
Are we testing for true reliability?
,
1992,
IEEE Software.
[7]
Peter A. Barrett,et al.
Software Fault Tolerance: An Evaluation
,
1985,
IEEE Transactions on Software Engineering.
[8]
F. T. Sheldon,et al.
UTARK: an object-based real-time kernel for distributed embedded systems
,
1993,
1993 CompEuro Proceedings Computers in Design, Manufacturing, and Production.
[9]
David F. McAllister,et al.
Fault-Tolerant SoFtware Reliability Modeling
,
1987,
IEEE Transactions on Software Engineering.
[10]
Frederick T. Sheldon,et al.
Reliability prediction of distributed embedded fault-tolerant systems
,
1993,
Proceedings of 1993 IEEE International Symposium on Software Reliability Engineering.
[11]
Lori A. Clarke,et al.
A Formal Model of Program Dependences and Its Implications for Software Testing, Debugging, and Maintenance
,
1990,
IEEE Trans. Software Eng..
[12]
Krishna M. Kavi,et al.
Reliability measurement: from theory to practice
,
1992,
IEEE Software.
[13]
Jean Arlat,et al.
Dependability Modeling and Evaluation of Software Fault-Tolerant Systems
,
1990,
IEEE Trans. Computers.