A Methodology for Stochastic Fault Simulation in VLSI Processor Architectures

We present a simulation methodology for fault analysis within VLSI designs. Our approach uses stochastic fault injection to decrease the required simulation time when performing fault analysis. By using a fraction of the total number of available injection points, we obtain a statistical characterization of the design under test without rigorously testing every gate within the circuit at every point in time. Our approach targets the characterization of fault behavior in large-scale circuits including full CPU architecture designs that would normally be too complex for traditional fault-analysis techniques. We present two methods for performing stochastic fault injection. One requires component library modification and is implemented entirely in VLSI source code. The second does not modify component libraries but rather interacts directly with the simulation environment and is implemented using programming language interface (PLI). We compare the simulation cost of each as well as the trade offs in design analysis. Furthermore, we introduce an optimization strategy when performing fault analysis that utilizes the checkpointing feature typically found in VLSI simulation environments. This optimization eliminates the need for multiple simulations of a single benchmark program with different random seeds. Our techniques can be implemented in any VLSI simulation environment supporting PLI and is compatible with both VHDL and Verilog designs. We present our algorithms and demonstrate their usage on the fully implemented OpenRISC processor [2]. Finally, we validate our method by demonstrating an increase in accuracy when increasing the number of injected faults. We show that even with as few as 250 faults, we see at most an 8 percent difference in the standard deviation and as the number of faults increase, the accuracy quickly approaches 100 percent.

[1]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[2]  David J. Frank,et al.  Nanoscale CMOS , 1999, Proc. IEEE.

[3]  Lloyd W. Massengill,et al.  Impact of scaling on soft-error rates in commercial microprocessors , 2002 .

[4]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[5]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[6]  Robert M. McDermott Random Fault Analysis , 1981, 18th Design Automation Conference.

[7]  C. Constantinescu Estimation of the coverage probabilities by 3-stage sampling , 1995, Annual Reliability and Maintainability Symposium 1995 Proceedings.

[8]  Sujit Dey,et al.  A scalable soft spot analysis methodology for compound noise effects in nano-meter circuits , 2004, Proceedings. 41st Design Automation Conference, 2004..

[9]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Nur A. Touba,et al.  Cost-effective approach for reducing soft error failure rate in logic circuits , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[11]  Sami A. Al-Arian,et al.  Fault simulation and test generation by fault sampling techniques , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.

[12]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[13]  Jacob A. Abraham,et al.  FERRARI: A Flexible Software-Based Fault and Error Injection System , 1995, IEEE Trans. Computers.

[14]  Barry W. Johnson,et al.  Coverage Estimation Using Statistics of the Extremes for When Testing Reveals No Failures , 2002, IEEE Trans. Computers.

[15]  Ibrahim N. Hajj,et al.  The complexity of fault detection in MOS VLSI circuits , 1990, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[16]  Jean Arlat,et al.  Coverage Estimation Methods for Stratified Fault Injection , 1999, IEEE Trans. Computers.

[17]  Jean Arlat,et al.  Estimators for Fault Tolerance Coverage Evaluation , 1995, IEEE Trans. Computers.

[18]  J. Patel,et al.  A Gate-Level Simulation Environment for Alpha-Particle-Induced Transient Faults , 1996, IEEE Trans. Computers.

[19]  Andreas Steininger,et al.  Dealing with dormant faults in an embedded fault-tolerant computer system , 2003, IEEE Trans. Reliab..

[20]  Dan Alexandrescu,et al.  New methods for evaluating the impact of single event transients in VDSM ICs , 2002, 17th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2002. DFT 2002. Proceedings..

[21]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[22]  Johan Karlsson,et al.  Fault injection into VHDL models: the MEFISTO tool , 1994 .

[23]  Raphael R. Some,et al.  A software-implemented fault injection methodology for design and validation of system fault tolerance , 2001, 2001 International Conference on Dependable Systems and Networks.

[24]  Daniel P. Siewiorek,et al.  FIAT-fault injection based automated testing environment , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.