On the Correlation between Controller Faults and Instruction-Level Errors in Modern Microprocessors

We investigate the correlation between register transfer-level faults in the control logic of a modern microprocessor and their instruction-level impact on the execution flow of typical programs. Such information can prove immensely useful in accurately assessing and prioritizing faults with regards to their criticality, as well as commensurately allocating resources to enhance testability, diagnosability, manufacturability and reliability. To this end, we developed an extensive infrastructure which allows injection of stuck-at faults and transient errors of arbitrary starting point and duration, as well as cost-effective simulation and classification of their repercussions into various instruction-level error types. As a test vehicle for our study, we employ a superscalar, dynamically-scheduled, out-of-order, Alpha-like microprocessor, on which we execute SPEC2000 integer benchmarks. Extensive experimentation with faults injected in control logic modules of this microprocessor reveals interesting trends and results, corroborating the utility of this simulation infrastructure and motivating its further development and application to various tasks related to robust design.

[1]  Yiorgos Makris,et al.  Entropy-driven parity-tree selection for low-overhead concurrent error detection in finite state machines , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Lloyd W. Massengill,et al.  Impact of scaling on soft-error rates in commercial microprocessors , 2002 .

[3]  Elizabeth M. Rudnick,et al.  A Gate-Level Simulation Environment for Alpha-Particle-Induced Transient Faults , 1996, IEEE Trans. Computers.

[4]  Jan Torin,et al.  Evaluating processor-behavior and three error-detection mechanisms using physical fault-injection , 1995 .

[5]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[6]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  Neeraj Suri,et al.  Designing high-performance and reliable superscalar architectures-the out of order reliable superscalar (O3RS) approach , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[8]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[9]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[10]  Jacob A. Abraham,et al.  FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.

[11]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[12]  N. D. Durie,et al.  Digest of papers , 1976 .

[13]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[14]  Sanjay J. Patel,et al.  ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[15]  Johan Karlsson,et al.  Fault injection into VHDL models: the MEFISTO tool , 1994 .

[16]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[17]  Ravishankar K. Iyer,et al.  Characterization of linux kernel behavior under errors , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[18]  Daniel P. Siewiorek,et al.  Effects of transient gate-level faults on program behavior , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[19]  Marcus Rimén,et al.  A Study of the Error Behavior of a 32-bit RISC Subjected to Simulated Transient Fault Injection , 1992, Proceedings International Test Conference 1992.

[20]  Steffen Graf,et al.  Error Detection Circuits , 1993 .

[21]  Sara Blanc,et al.  Enhancement of Fault Injection Techniques Based on the Modification of VHDL Code , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Cecilia Metra,et al.  On-line detection of logic errors due to crosstalk, delay, and transient faults , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[23]  Nur A. Touba,et al.  Cost-effective approach for reducing soft error failure rate in logic circuits , 2003, International Test Conference, 2003. Proceedings. ITC 2003..