A Design Diversity Metric and Analysis of Redundant Systems

Redundant systems are designed using multiple copies of the same resource (e.g., a logic network or a software module) in order to increase system dependability. Design diversity has long been used to protect redundant systems from common-mode failures. The conventional notion of diversity relies on "independent" generation of "different" implementations. This concept is qualitative and does not provide a basis for comparing the reliabilities of two diverse systems. In this paper, for the first time, we present a metric to quantify diversity among several designs and illustrate its effectiveness using several examples. Applications of this metric in analyzing reliability and availability of diverse redundant systems, and deriving simple relationships between diversity, system failure rate, and mission time are also demonstrated.

[1]  Edward J. McCluskey,et al.  Probability models for pseudorandom test sequences , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Edward J. McCluskey,et al.  Column-Based Precompiled Configurating Techniques for FPGA Fault Tolerance , 2001 .

[3]  Charles E. Stroud Reliability of majority voting based VLSI fault-tolerant circuits , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[4]  Jie Liu,et al.  Heavy ion induced single event effects in semiconductor device , 1998 .

[5]  Edward J. McCluskey,et al.  Stuck-fault tests vs. actual defects , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[6]  P. R. Stephan,et al.  SIS : A System for Sequential Circuit Synthesis , 1992 .

[7]  Edward J. McCluskey,et al.  Design diversity for concurrent error detection in sequential logic circuits , 2001, Proceedings 19th IEEE VLSI Test Symposium. VTS 2001.

[8]  Edward J. McCluskey,et al.  A design diversity metric and reliability analysis for redundant systems , 1999, International Test Conference 1999. Proceedings (IEEE Cat. No.99CH37034).

[9]  Lisa Spainhower,et al.  IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM J. Res. Dev..

[10]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[11]  Martin Hiller,et al.  An experimental comparison of fault and error injection , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[12]  Dave E. Eckhardt,et al.  A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors , 1985, IEEE Transactions on Software Engineering.

[13]  Kishor S. Trivedi,et al.  Modeling Correlation in Software Recovery Blocks , 1993, IEEE Trans. Software Eng..

[14]  Kilin To Fault Folding for Irredundant and Redundant Combinational Circuits , 1973, IEEE Transactions on Computers.

[15]  Edward J. McCluskey,et al.  Techniques for estimation of design diversity for combinational logic circuits , 2001, 2001 International Conference on Dependable Systems and Networks.

[16]  Edward J. McCluskey,et al.  Fault escapes in duplex systems , 2000, Proceedings 18th IEEE VLSI Test Symposium.

[17]  Ronald Riter,et al.  Modeling and testing a critical fault-tolerant multi-process system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[18]  Edward J. McCluskey,et al.  Dependable Computing and Online Testing in Adaptive and Configurable Systems , 2000, IEEE Des. Test Comput..

[19]  Yoshihiro Tohma,et al.  Failure-Tolerant Sequential Machines with Past Information , 1971, IEEE Transactions on Computers.

[20]  Pascal Traverse,et al.  AIRBUS A320/A330/A340 electrical flight controls - A family of fault-tolerant systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[21]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[22]  D. P. Siewiorek,et al.  Evaluation and comparison of fault-tolerant software techniques , 1993 .

[23]  Edward J. McCluskey,et al.  Design of redundant systems protected against common-mode failures , 2001, Proceedings 19th IEEE VLSI Test Symposium. VTS 2001.

[24]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[25]  Edward J. McCluskey,et al.  Dependable adaptive computing systems-the ROAR project , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[26]  Edward J. McCluskey,et al.  Common-mode failures in redundant VLSI systems: a survey , 2000, IEEE Trans. Reliab..

[27]  Michael R. Lyu,et al.  Handbook of software reliability engineering , 1996 .

[28]  John S. Liptay,et al.  A high-frequency custom CMOS S/390 microprocessor , 1997, IBM J. Res. Dev..

[29]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[30]  Edward J. McCluskey,et al.  Fast run-time fault location in dependable FPGA-based applications , 2001, Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[31]  EDWARD J. McCLUSKEY,et al.  Fault Equivalence in Combinational Logic Networks , 1971, IEEE Transactions on Computers.

[32]  Michael R. Lyu,et al.  Assuring Design Diversity in N-Version Software: A Design Paradigm for N-Version Programming , 1992 .

[33]  Edward J. McCluskey,et al.  Column-Based Precompiled Configuration Techniques for FPGA , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[34]  Edward J. McCluskey,et al.  Combinational logic synthesis for diversity in duplex systems , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[35]  Carlo H. Séquin,et al.  Reducing common mode failures in duplicate modules , 1984 .

[36]  Algirdas Avizienis,et al.  Fault Tolerance by Design Diversity: Concepts and Experiments , 1984, Computer.

[37]  Melvin A. Breuer,et al.  Digital systems testing and testable design , 1990 .

[38]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[39]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[40]  R. Reed,et al.  Heavy ion and proton-induced single event multiple upset , 1997 .

[41]  Shedletsky,et al.  The Error Latency of a Fault in a Sequential Digital Circuit , 1976, IEEE Transactions on Computers.

[42]  Bev Littlewood The impact of diversity upon common mode failures , 1996 .

[43]  Daniel P. Siewiorek Reliability Modeling of Compensating Module Failures in Majority Voted Redundancy , 1975, IEEE Transactions on Computers.

[44]  J. H. Lala,et al.  Architectural principles for safety-critical real-time applications , 1994, Proc. IEEE.