Masking Soft Errors with Static Bitwise Analysis

Due to continuous improvements in the VLSI technologies, the dependability of computing, caused by soft errors, has become increasingly a design challenge. Current protection techniques usually incur significant economic costs, performance degradation or resource consumption. This paper introduces a lightweight software approach for mitigating soft errors. By exploiting the facts that many data values have narrow width or constant bits, indicating that a large fraction of binary bits are unused or constant, we can predict these data values before program execution. First of all, invariants concerning bit-level data widths and values are identified by performing two bitwise data-flow analyses. Based on the bitwise analysis results, the masking operations are inserted to clear the possible errors in the known-value bits for reducing the window of vulnerability. Then the program reliability is improved with minimum penalty. To improve the effectiveness, the covered mask analysis can remove the non-vital masking operations without affecting the dependability. We have implemented our approach in the LLVM compiler. The fault injection experimental results for the MiBench benchmarks indicate that our approach improves the reliability of programs by 8.03% while incurring only 1.61% performance overhead.

[1]  Pedro Reviriego,et al.  Enhanced Duplication: a Technique to Correct Soft Errors in Narrow Values , 2013, IEEE Computer Architecture Letters.

[2]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[3]  Jing Yu,et al.  ESoftCheck: Removal of Non-vital Checks for Fault Tolerance , 2009, 2009 International Symposium on Code Generation and Optimization.

[4]  Qing Wan,et al.  Epipe: A low-cost fault-tolerance technique considering WCET constraints , 2013, J. Syst. Archit..

[5]  Yun Zhou,et al.  The Reliability Wall for Exascale Supercomputing , 2012, IEEE Transactions on Computers.

[6]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Devesh Tiwari,et al.  Clover: Compiler Directed Lightweight Soft Error Resilience , 2015, LCTES.

[8]  Jingling Xue,et al.  SVF: interprocedural static value-flow analysis in LLVM , 2016, CC.

[9]  Jingling Xue,et al.  PartialRC: A Partial Recomputing Method for Efficient Fault Recovery on GPGPUs , 2012, Journal of Computer Science and Technology.

[10]  N. Hengartner,et al.  Predicting the number of fatal soft errors in Los Alamos national laboratory's ASC Q supercomputer , 2005, IEEE Transactions on Device and Materials Reliability.

[11]  Jingling Xue,et al.  On-demand strong update analysis via value-flow refinement , 2016, SIGSOFT FSE.

[12]  Alan Jay Smith,et al.  Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.

[13]  Mark Stephenson,et al.  Bidwidth analysis with application to silicon compilation , 2000, PLDI '00.

[14]  Margaret Martonosi,et al.  Value-based clock gating and operation packing: dynamic strategies for improving processor power and performance , 2000, TOCS.

[15]  Seth Copen Goldstein,et al.  BitValue Inference: Detecting and Exploiting Narrow Bitwidth Computations , 2000, Euro-Par.

[16]  Jörg Henkel,et al.  Self-Immunity Technique to Improve Register File Integrity Against Soft Errors , 2011, 2011 24th Internatioal Conference on VLSI Design.

[17]  裕幸 飯田,et al.  International Technology Roadmap for Semiconductors 2003の要求清浄度について - シリコンウエハ表面と雰囲気環境に要求される清浄度, 分析方法の現状について - , 2004 .

[18]  Jingling Xue,et al.  Region-Based Selective Flow-Sensitive Pointer Analysis , 2014, SAS.

[19]  Aviral Shrivastava,et al.  Static analysis to mitigate soft errors in register files , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[20]  QingPing Tan,et al.  Exploiting Narrow Data-Width to Mask Soft Errors in Register Files , 2014, SAFECOMP.

[21]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[22]  Muhammad Shafique,et al.  The EDA challenges in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[24]  Keshav Pingali,et al.  Dependence-based program analysis , 1993, PLDI '93.

[25]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .