System-level reliability analysis considering imperfect fault coverage

Safety-critical systems rely on redundancy schemes such as k-out-of-n structures which enable tolerance against multiple faults. These techniques are subject to Imperfect Fault Coverage (IFC) as error detection and recovery might be prone to errors or even impossible for certain fault models. As a result, these techniques may act as single points of failure in the system where uncovered faults might be overlooked and lead to wrong system outputs. Neglecting IFC in reliability analysis may lead to fatal overestimations in case of safety-critical applications. Yet, existing techniques that do consider IFC are overly pessimistic in assuming that the occurrence of an uncovered fault always results in a system failure. But often, in particular in complex systems with nested redundant structures, a fault that is not noticed by an inner redundancy scheme might be caught by an outer redundancy scheme. This paper proposes to automatically incorporate IFC into reliability models, i. e. Binary Decision Diagrams (BDDs), to enable an accurate reliability analysis for complex system structures including nested redundancies and repeated components. It also shows that IFC does not equally affect different redundancy schemes. Experimental results presented for applications in multimedia and automotive confirm that the proposed approach can analyze system reliability more accurately at an acceptable execution time and memory overhead compared to the underlying IFC-unaware technique.

[1]  Hamid R. Zarandi,et al.  A Fast and Accurate Fault Tree Analysis Based on Stochastic Logic Implemented on Field-Programmable Gate Arrays , 2013, IEEE Transactions on Reliability.

[2]  Liudong Xing,et al.  Combinatorial Reliability Analysis of Imperfect Coverage Systems Subject to Functional Dependence , 2014, IEEE Transactions on Reliability.

[3]  Kishor S. Trivedi,et al.  Dependability analysis of distributed computer systems with imperfect coverage , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[4]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[5]  Antoine Rauzy,et al.  Efficient Reliability Assessment of Redundant Systems Subject to Imperfect Fault Coverage Using Binary Decision Diagrams , 2008, IEEE Transactions on Reliability.

[6]  Martin Lukasiewycz,et al.  Reliability-Aware System Synthesis , 2007 .

[7]  Albert F. Myers,et al.  k-out-of-n: G System Reliability With Imperfect Fault Coverage , 2007, IEEE Trans. Reliab..

[8]  Mile K. Stojcev,et al.  A mid-value select voter , 2005, Microelectron. Reliab..

[9]  Michael Glaß,et al.  Automatic success tree-based reliability analysis for the consideration of transient and permanent faults , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Martin Lukasiewycz,et al.  Reliability-Aware System Synthesis , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[11]  Jürgen Teich,et al.  System-Level Synthesis Using Evolutionary Algorithms , 1998, Des. Autom. Embed. Syst..

[12]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[13]  M. Namjoo,et al.  WATCHDOG PROCESSORS AND CAPABILITY CHECKING , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[14]  Joanne Bechta Dugan,et al.  A combinatorial approach to modeling imperfect coverage , 1995 .

[15]  Gregory Levitin,et al.  Explicit and implicit methods for probabilistic common-cause failure analysis , 2014, Reliab. Eng. Syst. Saf..

[16]  Ali Mosleh Common cause failures: An analysis methodology and examples , 1991 .

[17]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[18]  Suprasad V. Amari,et al.  A separable method for incorporating imperfect fault-coverage into combinatorial models , 1999 .

[19]  Liudong Xing,et al.  Binary decision diagram-based reliability evaluation of k-out-of-(n + k) warm standby systems subject to fault-level coverage , 2013 .

[20]  Michael Glaß,et al.  Design space exploration of reliable networked embedded systems , 2007, J. Syst. Archit..

[21]  Albert F. Myers Achievable Limits on the Reliability of $k$-out-of- $n$:G Systems Subject to Imperfect Fault Coverage , 2008, IEEE Transactions on Reliability.

[22]  B.C. Paul,et al.  Impact of NBTI on the temporal performance degradation of digital circuits , 2005, IEEE Electron Device Letters.

[23]  Michael Glaß,et al.  Automatic Reliability Analysis in the Presence of Probabilistic Common Cause Failures , 2017, IEEE Transactions on Reliability.

[24]  Antoine Rauzy,et al.  Assessment of redundant systems with imperfect coverage by means of binary decision diagrams , 2008, Reliab. Eng. Syst. Saf..

[25]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[26]  V. Benes,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[27]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[28]  Martin Lukasiewycz,et al.  Interactive presentation: Reliability-aware system synthesis , 2007 .

[29]  Martin Lukasiewycz,et al.  Symbolic system level reliability analysis , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[30]  Mohsen Jahanshahi,et al.  Improving the reliability of the Benes network for use in large-scale systems , 2015, Microelectron. Reliab..