A failure-distance based method to bound the reliability of nonrepairable fault-tolerant systems without the knowledge of minimal cuts

CTMC (continuous-time Markov chains) are a commonly used formalism for modeling fault-tolerant systems. One of the major drawbacks of CTMC is the well-known state-space explosion problem. This paper develops and analyzes a method (SC-BM) to compute bounds for the reliability of nonrepairable fault-tolerant systems in which only a portion of the state space of the CTMC is generated. SC-BM uses the failure distance concept as the method described previously by the authors (1997) but, unlike that method, which is based on the computation of exact failure distances, SC-BM uses lower bounds for failure distances, which are computed on the system fault-tree, avoiding the computation and holding of all minimal cuts as required in the earlier work. This is important because computation of all minimal cuts is NP-hard and the number of minimal cuts can be very large. In some cases SC-BM gives exactly the same bounds as the previous method; in other cases it gives less tight bounds. SC-BM computes tight bounds for the reliability of quite complex systems with an affordable number of generated states for short to quite large mission times. The analysis of several examples seems to show that the bounds obtained by SC-BM appreciably outperform those obtained by simpler methods, and, when they are not equal, are only slightly worse than the bounds obtained by the previous method. In addition, the overhead in CPU time due to computing lower bounds for failure distances seems to be reasonable.

[1]  Kishor S. Trivedi,et al.  Reliability Modeling Using SHARPE , 1987, IEEE Transactions on Reliability.

[2]  V. Suñé,et al.  A method for the computation of reliability bounds for non-repairable fault-tolerant systems , 1997, Proceedings Fifth International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[3]  Joanne Bechta Dugan,et al.  Dependability assessment using binary decision diagrams (BDDs) , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Donald Gross,et al.  The Randomization Technique as a Modeling Tool and Solution Procedure for Transient Markov Processes , 1984, Oper. Res..

[5]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[6]  Jacob A. Abraham,et al.  A Numerical Technique for the Hierarchical Evaluation of Large, Closed Fault-Tolerant Systems , 1992 .

[7]  James L. Peterson Computation Sequence Sets , 1976, J. Comput. Syst. Sci..

[8]  Takehisa Kohda,et al.  Finding modules in fault trees , 1989 .

[9]  Joanne Bechta Dugan,et al.  Fault trees and imperfect coverage , 1989 .

[10]  Ralph A. Evans,et al.  IEEE transactions on reliability , 2004, IEEE Transactions on Reliability.

[11]  Yves Dutuit,et al.  A linear-time algorithm to find modules of fault trees , 1996, IEEE Trans. Reliab..

[12]  V. Suñé,et al.  An algorithm to find minimal cuts of coherent fault-trees with event-classes, using a decision tree , 1999 .

[13]  Daniel P. Heyman,et al.  Stochastic models in operations research , 1982 .

[14]  Malathi Veeraraghavan,et al.  An Approach to Solving Large Reliability Models , 1988 .