Reliability Importance of Components in a Real-Time Computing System with Standby Redundancy Schemes

Component importance analysis is to measure the effect on system reliability of component reliabilities, and is used to the system design from the reliability point of view. On the other hand, to guarantee high reliability of real-time computing systems, redundancy has been widely applied, which plays an important role in enhancing system reliability. One of commonly used type of redundancy is the standby redundancy. However, redundancy increases not only the complexity of a system but also the complexity of associated problems such as common-mode error. In this paper, we consider the component importance analysis of a real-time computing system with warm standby redundancy in the presence of Common-Cause Failures (CCFs). Although the CCFs are known as a risk factor of degradation of system reliability, it is difficult to evaluate the component importance measures in the presence of CCFs analytically. This paper introduces a Continuous-Time Markov Chain (CTMC) model for real-time computing system, and applies the CTMC-based component-wise sensitivity analysis which can evaluate the component importance measures without any structure function of system. In numerical experiments, we evaluate the effect of CCFs by the comparison of system performance measure and component importance in the case of system without CCF with those in the case of system with CCFs. Also, we compare the effect of CCFs on the system in warm and hot standby configurations. KeywordsComponent importance measures, Standby redundancy, Real-time computing system, Common-cause failure, Markov chains.

[1]  Tadashi Dohi,et al.  Component Importance Measures for Real-Time Computing Systems in the Presence of Common-Cause Failures , 2015, 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing (PRDC).

[2]  Kishor S. Trivedi,et al.  Importance analysis with Markov chains , 2003, Annual Reliability and Maintainability Symposium, 2003..

[3]  Kishor S. Trivedi,et al.  MODELING FAILURE DEPENDENCIES IN RELIABILITY ANALYSIS USING STOCHASTIC PETRI NETS , 2007 .

[4]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[5]  Zhijie Pan,et al.  Importance analysis for the systems with common cause failures , 1995 .

[6]  Brigitte Plateau,et al.  Stochastic Automata Networks , 2021, Introduction to the Numerical Solution of Markov Chains.

[7]  D. Goldsman Operations Research Models and Methods , 2003 .

[8]  Eric R. Ziegel,et al.  System Reliability Theory: Models, Statistical Methods, and Applications , 2004, Technometrics.

[9]  Ernest J. Henley,et al.  Reliability engineering and risk assessment , 1981 .

[10]  R. P. Hughes,et al.  A new approach to common cause failure , 1987 .

[11]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[12]  Okamura Hiroyuki,et al.  A Note on Sensitivity of Transient Solutions of Continuous-Time Markov Chains , 2013 .

[13]  Tadashi Dohi,et al.  Availability Importance Measures for Virtualized System with Live Migration , 2015 .

[14]  Z W Birnbaum,et al.  ON THE IMPORTANCE OF DIFFERENT COMPONENTS IN A MULTICOMPONENT SYSTEM , 1968 .

[15]  Mark L. Ayers,et al.  Telecommunications System Reliability Engineering, Theory, and Practice , 2012 .

[16]  M. Eslami,et al.  Introduction to System Sensitivity Theory , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  G. W. Parry,et al.  An approach to the analysis of common cause failure data for plant-specific application , 1994 .

[18]  Mark L. Ayers Telecommunications System Reliability Engineering, Theory, and Practice: Ayers/Telecommunications System Reliability Engineering, Theory, and Practice , 2012 .

[19]  Way Kuo,et al.  Importance Measures in Reliability, Risk, and Optimization: Principles and Applications , 2012 .