Transient fault models and AVF estimation revisited

Transient faults (also known as soft-errors) resulting from high-energy particle strikes on silicon are typically modeled as single bit-flips in memory arrays. Most Architectural Vulnerability Factor (AVF) analyses assume this model. However, accelerated radiation tests on static random access memory (SRAM) arrays built using modern technologies show evidence of clustered upsets resulting from single particle strikes. In this paper, these observations are used to define a scalable fault model capable of representing fault multiplicities. Applying this model, a probabilistic framework for incorporating vulnerability of SRAM arrays to different fault multiplicities into AVF is proposed. An experimental fault injection setup using a detailed microarchitecture simulation running generic benchmarks was used to demonstrate vulnerability characterization in light of the new fault model. Further, rigorous fault injection is used to demonstrate that conventional methods of AVF estimation overestimate vulnerability up to 7× for some structures.

[1]  E. Amirante,et al.  Investigation of Increased Multi-Bit Failure Rate Due to Neutron Induced SEU in Advanced Embedded SRAMs , 2007, 2007 IEEE Symposium on VLSI Circuits.

[2]  K. Osada,et al.  SRAM immunity to cosmic-ray-induced multierrors based on analysis of an induced parasitic bipolar effect , 2004, IEEE Journal of Solid-State Circuits.

[3]  Ryuji Kan,et al.  Validation of hardware error recovery mechanisms for the SPARC64 V microprocessor , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[4]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[5]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[6]  Manoj Franklin,et al.  Energy efficient asymmetrically ported register files , 2003, Proceedings 21st International Conference on Computer Design.

[7]  David I. August,et al.  Software-controlled fault tolerance , 2005, TACO.

[8]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[9]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[10]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[11]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[12]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[13]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[14]  Arijit Biswas,et al.  Computing Accurate AVFs using ACE Analysis on Performance Models: A Rebuttal , 2008, IEEE Computer Architecture Letters.

[15]  P. Hazucha,et al.  Cosmic-ray soft error rate characterization of a standard 0.6-/spl mu/m CMOS process , 2000, IEEE Journal of Solid-State Circuits.

[16]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..