Performability Analysis of Two Approaches to Fault Tolerance

We present a quantitative comparison of two popular approaches for recovering from CPU errors: Quadruple Modular Redundancy and Backward Error Recovery. Both are used in existing fault-tolerant systems offering basically the same main features and, in particular, the same fault-tolerance services (transparent recovery for hardware faults). We show that the use of performability measures is richer than classical dependability analysis. Given that they take into account not only reliability aspects but also performance metrics, they allow a deeper insight into the behaviour of the considered systems. For instance, they allow the user to identify different mission lengths leading to better adaptation of each type of architecture.

[1]  Christine Morin,et al.  An Architecture for Tolerating Processor Failures in Shared Memory Multiprocessors , 1996, IEEE Trans. Computers.

[2]  Philip A. Bernstein,et al.  Sequoia: a fault-tolerant tightly coupled multiprocessor for transaction processing , 1988, Computer.

[3]  Ravishankar K. Iyer,et al.  Faults, symptoms, and software fault tolerance in the Tandem GUARDIAN90 operating system , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[4]  S. Webber,et al.  The Stratus architecture , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[5]  Edmundo de Souza e Silva,et al.  Calculating transient distributions of cumulative reward , 1995, SIGMETRICS '95/PERFORMANCE '95.

[6]  Bruno Sericola,et al.  Performability Analysis: A New Algorithm , 1996, IEEE Trans. Computers.

[7]  Janak H. Patel,et al.  Error Recovery in Shared Memory Multiprocessors Using Private Caches , 1990, IEEE Trans. Parallel Distributed Syst..

[8]  D. Jewett,et al.  Integrity S2: A Fault-Tolerant Unix Platform , 1991, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[9]  Jack J. Stiffler Fault Tolerant Architectures - Past, Present, and (?) Future , 1993, Hardware and Software Architectures for Fault Tolerance.

[10]  Kishor S. Trivedi,et al.  Markov and Markov reward model transient analysis: An overview of numerical approaches , 1989 .