On the value of redundancy subject to common-cause failures: Toward the resolution of an on-going debate

Abstract Common-cause failures (CCF) are one of the more critical and challenging issues for system reliability and risk analyses. Academic interest in modeling CCF, and more broadly in modeling dependent failures, has steadily grown over the years in the number of publications as well as in the sophistication of the analytical tools used. In the past few years, several influential articles have shed doubts on the relevance of redundancy arguing that “redundancy backfires” through common-cause failures, and that the latter dominate unreliability, thus defeating the purpose of redundancy. In this work, we take issue with some of the results of these publications. In their stead, we provide a nuanced perspective on the (contingent) value of redundancy subject to common-cause failures. First, we review the incremental reliability and MTTF provided by redundancy subject to common-cause failures. Second, we introduce the concept and develop the analytics of the “redundancy–relevance boundary”: we propose this redundancy–relevance boundary as a design-aid tool that provides an answer to the following question: what level of redundancy is relevant or advantageous given a varying prevalence of common-cause failures? We investigate the conditions under which different levels of redundancy provide an incremental MTTF over that of the single component in the face of common-cause failures. Recognizing that redundancy comes at a cost, we also conduct a cost–benefit analysis of redundancy subject to common-cause failures, and demonstrate how this analysis modifies the redundancy–relevance boundary. We show how the value of redundancy is contingent on the prevalence of common-cause failures, the redundancy level considered, and the monadic cost–benefit ratio. Finally we argue that general unqualified criticism of redundancy is misguided, and efforts are better spent for example on understanding and mitigating the potential sources of common-cause failures rather than deriding the concept of redundancy in system design.

[1]  The probabilistic modeling of external common cause failure shocks in redundant systems , 1995 .

[2]  Jussi K. Vaurio On the reliability of a k-out-of-n:G system with common-mode outages , 1994 .

[3]  T. Bedford,et al.  Probabilistic Risk Analysis: Foundations and Methods , 2001 .

[4]  C. Atwood The binomial failure rate common cause model , 1986 .

[5]  Ali Mosleh,et al.  A systematic procedure for the incorporation of common cause events into risk and reliability models , 1986 .

[6]  Joseph H. Saleh,et al.  Reliability: How much is it worth? Beyond its estimation or prediction, the (net) present value of reliability , 2006, Reliab. Eng. Syst. Saf..

[7]  Jussi K. Vaurio Consistent mapping of common cause failure rates and alpha factors , 2007, Reliab. Eng. Syst. Saf..

[8]  George Apostolakis,et al.  The foundations of models of dependence in probabilistic safety assessment , 1987 .

[9]  H. M. Paula,et al.  A cause-defense approach to the understanding and analysis of common cause failures , 1990 .

[10]  Jussi K. Vaurio Uncertainties and quantification of common cause failure rates and probabilities for system analyses , 2005, Reliab. Eng. Syst. Saf..

[11]  Daniel E. Hastings,et al.  To reduce or to extend a spacecraft design lifetime , 2004 .

[12]  Marvin Rausand,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2003 .

[13]  Seth D Guikema,et al.  On the limitations of redundancies in the improvement of system reliability. , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[14]  J. I. Ansell,et al.  Practical Methods for Reliability Data Analysis , 1994 .

[15]  J.H. Saleh Flawed metrics: Satellite cost per transponder and cost per day , 2008, IEEE Transactions on Aerospace and Electronic Systems.

[16]  Ted W Yellman Redundancy in Designs , 2006, Risk analysis : an official publication of the Society for Risk Analysis.

[17]  Paul H. Kvam Estimation techniques for common cause failure data with different system sizes , 1996 .

[18]  Nancy G. Leveson,et al.  Role of Software in Spacecraft Accidents , 2004 .

[19]  Scott D. Sagan,et al.  The Problem of Redundancy Problem: Why More Nuclear Security Forces May Produce Less Nuclear Security † , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[20]  Jussi K. Vaurio The theory and quantification of common cause shock events for redundant standby systems , 1994 .

[21]  Ali Mosleh Common cause failures: An analysis methodology and examples , 1991 .