Assessing SEU Vulnerability via Circuit-Level Timing Analysis

Recently, there has been a growing concern that, in relation to process technology scaling, the soft-error rate will become a major challenge in designing reliable systems. In this work, we introduce a high-fidelity, high-performance simulation infrastructure for quantifying the derating effects on soft-error rates while considering microarchitectural, timing and logic-related masking, using realistic workloads on a CMP switch design. We use a gate-level model for the CMP switch design, enabling us to inject faults into blocks of combinational logic. We are then able to track logic-related and time-related fault masking, as well as microarchitecturalrelated fault masking, at the architecture level. We find out that for complex designs, logic-and time-related fault masking account for more than 50% of the masked faults. We also observe that only 3-4% of the injected faults propagate an error at the design’s output and cause an error in the application’s execution, resulting in a derating factor of 30. From our experiments, we also demonstrate that soft-error derating effects highly depend on the design’s characteristics

[1]  Li-Shiuan Peh,et al.  Flow control and micro-architectural mechanisms for extending the performance of interconnection networks , 2001 .

[2]  Ming Zhang,et al.  A soft error rate analysis (SERA) methodology , 2004, ICCAD 2004.

[3]  James L. Walsh,et al.  IBM experiments in soft fails in computer electronics (1978-1994) , 1996, IBM J. Res. Dev..

[4]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[5]  Alan Messer,et al.  Susceptibility of commodity systems and software to memory soft errors , 2004, IEEE Transactions on Computers.

[6]  Babak Falsafi,et al.  Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[7]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[9]  T. May,et al.  Alpha-particle-induced soft errors in dynamic memories , 1979, IEEE Transactions on Electron Devices.

[10]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[11]  James F. Ziegler,et al.  Terrestrial cosmic rays , 1996, IBM J. Res. Dev..

[12]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[13]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[15]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[16]  S. Narendra,et al.  Measurements and analysis of SER-tolerant latch in a 90-nm dual-V/sub T/ CMOS process , 2004, IEEE Journal of Solid-State Circuits.

[17]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2006, IEEE Micro.

[18]  T. Juhnke,et al.  Calculation of the Soft Error Rate of Submicron CMOS Logic Circuits , 1994, ESSCIRC '94: Twientieth European Solid-State Circuits Conference.

[19]  Ravishankar K. Iyer,et al.  Microprocessor sensitivity to failures: control vs. execution and combinational vs. sequential logic , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[20]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).