Reliability analysis of static and dynamic fault-tolerant systems subject to probabilistic common-cause failures

Fault-tolerant systems designed with redundancy techniques are typically subject to common-cause failures, which are multiple dependent component failures caused by a shared root cause or a common cause (also known as a shock). There are two types of shocks: fatal and non-fatal. A fatal shock (FS) will fail all components of a system. A non-fatal shock (NFS) will affect only a subset of system components. Most of the existing shock models have assumed that the occurrence of an NFS results in deterministic and simultaneous failures of the affected components. In practice, however, the occurrence of an NFS may result in failures of different components with different probabilities of occurrence. This behaviour is referred to as probabilistic NFS. In this paper, we consider the effects of probabilistic NFS in the reliability analysis of fault-tolerant systems. Both an explicit method and an implicit method are proposed for incorporating probabilistic NFS in the reliability analysis of static systems. A Markov approach combined with the Poisson decomposition law is proposed for incorporating probabilistic NFS in the reliability analysis of dynamic systems. The proposed approaches are illustrated through the analyses of several examples.

[1]  Wade Trappe,et al.  Mobile network management and robust spatial retreats via network dynamics , 2005, IEEE International Conference on Mobile Adhoc and Sensor Systems Conference, 2005..

[2]  Guevara Noubir,et al.  Low-power DoS attacks in data wireless LANs and countermeasures , 2003, MOCO.

[3]  Jean Arlat,et al.  Coverage Estimation Methods for Stratified Fault Injection , 1999, IEEE Trans. Computers.

[4]  Liudong Xing,et al.  Reliability Evaluation of Phased-Mission Systems With Imperfect Fault Coverage and Common-Cause Failures , 2007, IEEE Transactions on Reliability.

[5]  Szu Hui Ng,et al.  A model for correlated failures in N-version programming , 2004 .

[6]  Liudong Xing,et al.  System reliability analysis considering fatal and non-fatal shocks in a fault tolerant system , 2009, 2009 Annual Reliability and Maintainability Symposium.

[7]  Liudong Xing,et al.  Reliability analysis of hierarchical computer-based systems subject to common-cause failures , 2007, Reliab. Eng. Syst. Saf..

[8]  P. Hokstad,et al.  Estimation of common cause factors from systems with different numbers of channels , 2006, IEEE Transactions on Reliability.

[9]  Jussi K. Vaurio Uncertainties and quantification of common cause failure rates and probabilities for system analyses , 2005, Reliab. Eng. Syst. Saf..

[10]  Wenyuan Xu,et al.  Jamming sensor networks: attack and defense strategies , 2006, IEEE Network.

[11]  Edward J. McCluskey,et al.  Common-mode failures in redundant VLSI systems: a survey , 2000, IEEE Trans. Reliab..

[12]  J. Dugan,et al.  A modular approach for analyzing static and dynamic fault trees , 1997, Annual Reliability and Maintainability Symposium.

[13]  J. Borcsok,et al.  Estimation and Evaluation of Common Cause Failures , 2007, Second International Conference on Systems (ICONS'07).

[14]  Liudong Xing,et al.  Incorporating Common-Cause Failures Into the Modular Hierarchical Systems Analysis , 2009, IEEE Transactions on Reliability.

[15]  J.B. Dugan,et al.  Reliability analysis of phased mission systems with common cause failures , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[16]  J. K. Vaurio,et al.  An implicit method for incorporating common-cause failures in system analysis , 1998 .

[17]  Radha Poovendran,et al.  Optimal Jamming Attacks and Network Defense Policies in Wireless Sensor Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[18]  Antoine Rauzy,et al.  New algorithms for fault trees analysis , 1993 .

[19]  Liudong Xing,et al.  Probabilistic common-cause failures analysis , 2008, 2008 Annual Reliability and Maintainability Symposium.

[20]  Jussi K. Vaurio Fault tree analysis of phased mission systems with repairable and non-repairable components , 2001, Reliab. Eng. Syst. Saf..

[21]  M. Modarres What every engineer should know about reliability and risk analysis , 1992 .

[22]  K. C. Chae System reliability using binomial failure rate , 1988, 1988. Proceedings., Annual Reliability and Maintainability Symposium,.

[23]  John I. McCool,et al.  Probability and Statistics With Reliability, Queuing and Computer Science Applications , 2003, Technometrics.

[24]  Zhihua Tang,et al.  An integrated method for incorporating common cause failures in system analysis , 2004, Annual Symposium Reliability and Maintainability, 2004 - RAMS.

[25]  John A. Stankovic,et al.  Security in wireless sensor networks , 2004, SASN '04.

[26]  J.A. Stankovic,et al.  Denial of Service in Sensor Networks , 2002, Computer.

[27]  Xuemin Wang,et al.  Data mapping and the prediction of common cause failure probability , 2005, IEEE Transactions on Reliability.