SSD: an affordable fault tolerant architecture for superscalar processors

The paper proposes an integrity checking architecture for superscalar processors that can achieve fault tolerance capability of a duplex system at much less cost than the traditional duplication approach. The pipeline of the CPU core (P-pipeline) is combined in series with another pipeline (V-pipeline), which re-executes instructions processed in the P-pipeline. Operations in the two pipelines are compared and any mismatch triggers the recovery process. The V-pipeline design is based on replication of the P-pipeline, and minimized in size and functionality by taking advantage of control flow and data dependency resolved in the P-pipeline. Idle cycles propagated from the P-pipeline become extra time for the V-pipeline to keep up with program re-execution. For a large-scale superscalar processor, the proposed architecture can bring up to 61.4% reduction in die area and the average-execution time increase is 0.3%.

[1]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[2]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[3]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[4]  Kenneth C. Yeager,et al.  200-MHz superscalar RISC microprocessor , 1996, IEEE J. Solid State Circuits.

[5]  Arun K. Somani,et al.  REESE: a method of soft error detection in microprocessors , 2001, 2001 International Conference on Dependable Systems and Networks.

[6]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[7]  Harsh Sharangpani,et al.  Itanium Processor Microarchitecture , 2000, IEEE Micro.

[8]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[9]  Gene W. Shen,et al.  SPARC64: a 64-b 64-active-instruction out-of-order-execution MCM processor , 1995 .

[10]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[11]  P. K. Lala Self-Checking and Fault-Tolerant Digital Design , 1995 .

[12]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[13]  Michael Nicolaidis Design for soft-error robustness to rescue deep submicron scaling , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[14]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).