Reliability Analysis of Fault Tolerant Systems with Multi-Fault Coverage

Fault-tolerance has been an essential architectural attribute for achieving high reliability in many critical applications of digital systems. Automatic fault and error handling mechanisms play a crucial role in implementing fault tolerance because an uncovered (undetected) fault may lead to a system or a subsystem failure even when adequate redundancy exists. Examples of this effect can be found in computing systems, electrical power distribution networks, pipelines carrying dangerous materials etc. Because an uncovered fault may lead to overall system failure, an excessive level of redundancy may even reduce the system reliability. Therefore, an accurate analysis must account for not only the system structure, but also the system fault & error handling behavior (often called coverage behavior) as well. The appropriate coverage modeling approach depends on the type of fault tolerant techniques used. The recent research literature emphasizes the importance of multi-fault coverage models where the effectiveness of recovery mechanisms depends on the coexistence of multiple faults in a group of elements, which are also called fault level coverage (FLC) groups, that collectively participate in detecting and recovering the faults in that group. However, the methods for solving multi-fault coverage models are limited, primarily because of the complex nature of the dependency introduced by the reconfiguration mechanisms. The paper suggests a modification of the generalized reliability block diagram (RBD) method for evaluating reliability indices of systems with multi-fault coverage. The suggested method based on a universal generating function technique computes the reliability indices of complex systems with multi-fault coverage using a straightforward recursive procedure. The proposed algorithm can be easily used in the case of hierarchical structure of FLC groups. Illustrative examples are presented.

[1]  Robert Geist Extended behavioral decomposition for estimating ultrahigh reliability , 1991 .

[2]  Martin L. Shooman,et al.  Reliability of Computer Systems and Networks: Fault Tolerance,Analysis,and Design , 2002 .

[3]  Hoang Pham,et al.  Optimal design of k-out-of-n:G subsystems subjected to imperfect fault-coverage , 2004, IEEE Transactions on Reliability.

[4]  Gregory Levitin Optimal Structure of Multi-State Systems With Uncovered Failures , 2008, IEEE Transactions on Reliability.

[5]  Joanne Bechta Dugan,et al.  A combinatorial approach to modeling imperfect coverage , 1995 .

[6]  Walter A. Burkhard,et al.  Reliability and performance of RAIDs , 1995 .

[7]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[8]  Gregory Levitin,et al.  The Universal Generating Function in Reliability Analysis and Optimization , 2005 .

[9]  Yung-Ruei Chang,et al.  OBDD-based evaluation of reliability and importance measures for multistate systems subject to imperfect fault coverage , 2005, IEEE Transactions on Dependable and Secure Computing.

[10]  L. McLaughlin,et al.  Optimal cost-effective design of parallel systems subject to imperfect fault-coverage , 2003, Annual Reliability and Maintainability Symposium, 2003..

[11]  Yung-Ruei Chang,et al.  Computing system failure frequencies and reliability importance measures using OBDD , 2004, IEEE Transactions on Computers.

[12]  Suprasad V. Amari,et al.  A separable method for incorporating imperfect fault-coverage into combinatorial models , 1999 .

[13]  J Bavuso Salvatore,et al.  HiRel: Hybrid Automated Reliability Predictor (HARP) Integrated Reliability Tool System (Version 7.0) HARP Introduction and User''s Guide , 2003 .

[14]  Suprasad V. Amari,et al.  Optimal reliability of systems subject to imperfect fault-coverage , 1999 .

[15]  Thomas F. Arnold,et al.  The Concept of Coverage and Its Effect on the Reliability Model of a Repairable System , 1973, IEEE Transactions on Computers.

[16]  Yung-Ruei Chang,et al.  Reliability evaluation of multi-state systems subject to imperfect coverage using OBDD , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..

[17]  Kishor S. Trivedi,et al.  Decomposition in Reliability Analysis of Fault-Tolerant Systems , 1983, IEEE Transactions on Reliability.

[18]  Albert F. Myers,et al.  k-out-of-n: G System Reliability With Imperfect Fault Coverage , 2007, IEEE Trans. Reliab..

[19]  W. C. Carter,et al.  Reliability modeling techniques for self-repairing computer systems , 1969, ACM '69.

[20]  Kishor S. Trivedi,et al.  HiRel: Hybrid Automated Reliability Predictor (HARP) integrated reliability tool system, (version 7.0). Volume 2: HARP tutorial , 1994 .

[21]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.