Dual use of superscalar datapath for transient-fault detection and recovery

Diminutive devices and high clock frequency of future microprocessor generations are causing increased concerns for transient soft failures in hardware, necessitating fault detection and recovery mechanisms even in commodity processors. In this paper, we propose a fault-tolerant extension for modern superscalar out-of-order datapath that can be supported by only modest additional hardware. In the proposed extensions, error-detection is achieved by verifying the redundant results of dynamically replicated threads of executions, while the error-recovery scheme employs the instruction-rewind mechanism to restart at a failed instruction. We study the performance impact of augmenting superscalar microarchitectures with this fault tolerance mechanism. An analytical performance model is used in conjunction with a performance simulator The simulation results of 11 SPEC95 and SPEC2000 benchmarks show that in the absence of faults, error detection causes a 2% to 45% reduction in throughput, which is in line with other proposed detection schemes. In the presence of transient faults, the fast error recovery scheme contributes very little additional slowdown.

[1]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[2]  Neeraj Suri,et al.  Designing high-performance and reliable superscalar architectures-the out of order reliable superscalar (O3RS) approach , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[3]  Federico Faccio,et al.  Single event effects in static and dynamic registers in a 0.25 /spl mu/m CMOS technology , 1999 .

[4]  Gurindar S. Sohi,et al.  Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers , 1990, IEEE Trans. Computers.

[5]  Z. Hasnain,et al.  Building-in reliability: soft errors-a case study , 1992, 30th Annual Proceedings Reliability Physics 1992.

[6]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[7]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[8]  M.J. Flynn,et al.  Deep submicron microprocessor design issues , 1999, IEEE Micro.

[9]  Manoj Franklin A study of time redundant fault tolerance techniques for superscalar processors , 1995, Proceedings of International Workshop on Defect and Fault Tolerance in VLSI.

[10]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Eric Rotenberg,et al.  Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.

[12]  F. Faccio,et al.  Single Event Effects in Static and Dynamic Registers in a 0 . 25 μ m CMOS Technology , 1999 .

[13]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[15]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[16]  Eric Rotenberg,et al.  A study of slipstream processors , 2000, MICRO 33.

[17]  E. Normand Single event upset at ground level , 1996 .

[18]  Lorena Anghel,et al.  Self-checking circuits versus realistic faults in very deep submicron , 2000, Proceedings 18th IEEE VLSI Test Symposium.

[19]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[20]  T. May,et al.  A New Physical Mechanism for Soft Errors in Dynamic Memories , 1978, 16th International Reliability Physics Symposium.

[21]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.