Simple models of hardware and software fault tolerance

This paper presents a quantitative analysis of three different architectural approaches to the integration of hardware and software fault tolerance. Using a common set of assumptions, and hypothetical parameter values, the authors compare the reliability of DRB (Distributed Recovery Blocks), NVP (N-version programming) and NSCP (N self-checking Programming). A combination of fault trees and Markov reward models is used to consider transient and permanent physical faults, and independent and related software faults. The fault tree models capture the combinations of software faults and hardware transients that can upset a single task computation. The structure states of the Markov reward process captures the longer term behavior of the system as it is reconfigured in response to permanent faults. In addition to a base case, several different scenarios are considered, including perfect specifications, independent versions, perfect decider and perfect coverage. For most cases, DRB is found to be the most reliable.<<ETX>>

[1]  Jean-Claude Laprie,et al.  Dependability Evaluation of Software Systems in Operation , 1984, IEEE Transactions on Software Engineering.

[2]  Jaynarayan H. Lala,et al.  Hardware and software fault tolerance: a unified architectural approach , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[3]  K. H. Kim,et al.  Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications , 1989, IEEE Trans. Computers.

[4]  Jean-Claude Laprie,et al.  X-Ware Reliability and Availability Modeling , 1992, IEEE Trans. Software Eng..

[5]  Herbert Hecht Fault-Tolerant Software , 1979, IEEE Transactions on Reliability.

[6]  Kishor S. Trivedi,et al.  Reliability Modeling Using SHARPE , 1987, IEEE Transactions on Reliability.

[7]  Kang G. Shin,et al.  Evaluation of Error Recovery Blocks Used for Cooperating Processes , 1984, IEEE Transactions on Software Engineering.

[8]  Kishor S. Trivedi,et al.  Reliability estimation of fault-tolerant systems: tools and techniques , 1990, Computer.

[9]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[10]  George E. Stark Dependability Evaluation of Integrated Hardware/Software Systems , 1987, IEEE Transactions on Reliability.

[11]  David F. McAllister,et al.  Fault-Tolerant SoFtware Reliability Modeling , 1987, IEEE Transactions on Software Engineering.

[12]  Miroslaw Malek,et al.  Survey of software tools for evaluating reliability, availability, and serviceability , 1988, CSUR.

[13]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[14]  Jean Arlat,et al.  Definition and analysis of hardware- and software-fault-tolerant architectures , 1990, Computer.

[15]  Brian Randell System structure for software fault tolerance , 1975 .