Quantifying the reliability of proven SPIDER group membership service guarantees

For safety-critical systems, it is essential to quantify the reliability of the assumptions that underlie proven guarantees. We investigate the reliability of the assumptions of the SPIDER group membership service with respect to transient and permanent faults. Modeling 12,600 possible system configurations, the probability that SPIDER's maximum fault assumption does not hold for an hour mission varies from less likely than l0/sup -11/ to more likely than 10/sup -3/. In most cases examined, a transient fault tolerance strategy was superior to the permanent fault tolerance strategy previously in use for the range of transient fault arrival rates expected in aerospace systems. Reliability of the maximum fault assumption (upon which the proofs are based) differs greatly when subjected to asymmetric, symmetric, and benign faults. This case study demonstrates the benefits of quantifying the reliability of assumptions for proven properties.

[1]  Ricky W. Butler,et al.  The SURE approach to reliability analysis , 1992 .

[2]  Raj Jain Error characteristics of fiber distributed data interface (FDDI) , 1990, IEEE Trans. Commun..

[3]  Håkan Sivencrona,et al.  Byzantine Fault Tolerance, from Theory to Reality , 2003, SAFECOMP.

[4]  David Powell Failure mode assumptions and assumption coverage , 1992 .

[5]  Jean Arlat,et al.  Coverage Estimation Methods for Stratified Fault Injection , 1999, IEEE Trans. Computers.

[6]  Hermann Kopetz,et al.  Assumption coverage under different failure modes in the time-triggered architecture , 2001, ETFA 2001. 8th International Conference on Emerging Technologies and Factory Automation. Proceedings (Cat. No.01TH8597).

[7]  Jan Torin,et al.  Evaluation of fault handling of the time-triggered architecture with bus and star topology , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[8]  Ricky W. Butler,et al.  Techniques for Modeling the Reliability of Fault-Tolerant Systems With the Markov State-Space Approach , 1995 .

[9]  Jean-Claude Laprie,et al.  Diversity against accidental and deliberate faults , 1998, Proceedings Computer Security, Dependability, and Assurance: From Needs to Solutions (Cat. No.98EX358).

[10]  Idit Keidar,et al.  Group communication specifications: a comprehensive study , 2001, CSUR.