Accelerated microarchitectural Fault Injection-based reliability assessment

Statistical Fault Injection on microarchitectural simulators can provide early and accurate reliability characterization for array based hardware components. Besides, microarchitectural fault injectors are easily configurable (facilitating many reliability studies) and orders of magnitude faster than RTL fault injectors, rendering them appropriate tools for early reliability estimation using large and realistic benchmarks. However, the throughput of the fault injection campaigns on microarchitectural simulators remains a bottleneck when a batch of campaigns must run for early reliability estimation of a processor (different microarchitectural characteristics, different workloads). This paper presents two different operation modes on top of a baseline framework for statistical fault injection campaigns, trading off between accuracy and speedup of the injection campaigns with a state-of-the-art out-of-order full-system ×86-64 simulator as experimental vehicle. In the first mode, the injection experiments are stopped and classified as masked due to the following conditions: (i) the fault is over-written after the injection and it hasn't been read earlier, (ii) or the fault is injected on an invalid entry. The second mode has the same termination conditions as the first mode, but the injection experiments can also be terminated when an instruction that has read the faulty entry passes through the commit stage of the ×86-64 out-of-order architecture. In the first mode, we observed a speedup up to 2.92× with no loss of accuracy in the vulnerability measurements for all structures. In the second mode an even higher speedup of up to 4.06× has been obtained with small loss in the accuracy of the vulnerability measurements.

[1]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[2]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[3]  Shubhendu S. Mukherjee,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[4]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[6]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[7]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[8]  Régis Leveugle,et al.  Statistical fault injection: Quantified error and confidence , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[9]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS 2010.

[10]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.

[11]  John Lach,et al.  Transient fault models and AVF estimation revisited , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[12]  Lieven Eeckhout,et al.  AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Yu Cao,et al.  A resilience roadmap , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[14]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Alfredo Benso,et al.  Statistical Reliability Estimation of Microprocessor-Based Systems , 2012, IEEE Transactions on Computers.

[16]  Stijn Eyerman,et al.  A first-order mechanistic model for architectural vulnerability factor , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[17]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[18]  Michel Dubois,et al.  MACAU: A Markov model for reliability evaluations of caches under Single-bit and Multi-bit Upsets , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[19]  Charles C. Weems,et al.  Characterizing the microarchitectural side effects of operating system calls , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  B. Jacob,et al.  AN INTEgRATED SIMulATIoN INfRASTRuCTuRE foR THE ENTIRE MEMoRy HIERARCHy: CACHE, DRAM, NoNVolATIlE MEMoRy, AND DISk , 2013 .

[21]  Bruce Jacob,et al.  Technology comparison for large last-level caches (L3Cs): Low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[22]  Jacob A. Abraham,et al.  Quantitative evaluation of soft error injection techniques for robust system design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Sarita V. Adve,et al.  GangES: Gang error simulation for hardware resiliency evaluation , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[24]  Antonio Miele,et al.  A fault-injection methodology for the system-level dependability analysis of multiprocessor embedded systems , 2014, Microprocess. Microsystems.

[25]  Dimitris Gizopoulos,et al.  Versatile architecture-level fault injection framework for reliability evaluation: A first report , 2014, 2014 IEEE 20th International On-Line Testing Symposium (IOLTS).