Characterizing the effects of transient faults on a high-performance processor pipeline

The progression of implementation technologies into the sub-100 nanometer lithographies renew the importance of understanding and protecting against single-event upsets in digital systems. In this work, the effects of transient faults on high performance microprocessors is explored. To perform a thorough exploration, a highly detailed register transfer level model of a deeply pipelined, out-of-order microprocessor was created. Using fault injection, we determined that fewer than 15% of single bit corruptions in processor state result in software visible errors. These failures were analyzed to identify the most vulnerable portions of the processor, which were then protected using simple low-overhead techniques. This resulted in a 75% reduction in failures. Building upon the failure modes seen in the microarchitecture, fault injections into software were performed to investigate the level of masking that the software layer provides. Together, the baseline microarchitectural substrate and software mask more than 9 out of 10 transient faults from affecting correct program execution.

[1]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[2]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[3]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[4]  Elizabeth M. Rudnick,et al.  A Gate-Level Simulation Environment for Alpha-Particle-Induced Transient Faults , 1996, IEEE Trans. Computers.

[5]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[6]  S. McFarling Combining Branch Predictors , 1993 .

[7]  Todd M. Austin,et al.  A fault tolerant approach to microprocessor design , 2001, 2001 International Conference on Dependable Systems and Networks.

[8]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[9]  M. Baze,et al.  Attenuation of single event induced pulses in CMOS combinational logic , 1997 .

[10]  M. Baze,et al.  Comparison of error rates in combinational and sequential logic , 1997 .

[11]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[12]  Sanjay J. Patel,et al.  Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[13]  Jiri Gaisler A portable and fault-tolerant microprocessor based on the SPARC v8 architecture , 2002, Proceedings International Conference on Dependable Systems and Networks.

[14]  T.H. Lee,et al.  A 600 MHz superscalar RISC microprocessor with out-of-order execution , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[15]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[16]  Eric Rotenberg,et al.  Exploiting Large Ineffectual Instruction Sequences , 1999 .

[17]  Manoj Franklin Incorporating fault tolerance in superscalar processors , 1996, Proceedings of 3rd International Conference on High Performance Computing (HiPC).

[18]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[19]  Yogi Berra,et al.  When You Come to a Fork in the Road, Take It! , 2001 .

[20]  Daniel P. Siewiorek,et al.  Effects of transient gate-level faults on program behavior , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[21]  Johan Karlsson,et al.  On latching probability of particle induced transients in combinational networks , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.