LLFI : An Intermediate Code Level Fault Injector For Soft Computing Applications

Hardware errors are on the rise with reducing chip sizes. However, a certain class of applications called soft computing applications, (e.g., multimedia applications) can tolerate most hardware errors, except those that result in outcomes that deviate significantly from the error-free outcomes. We term such outcomes as Egregious Data Corruptions (EDCs). To identify source code level characteristics of EDC causing faults, we built an LLVM based fault injector tool called LLFI. LLFI performs fault injection at the intermediate code level of the application. We quantitatively validate LLFI accuracy with respect to assembly level fault injection. Using LLFI, we performed a study to identify the correlation between faults in specific data types, and EDC outcomes. This data categorization will help us identify detector placement locations with high coverage for EDC causing faults.

[1]  Martin C. Rinard,et al.  Automatically identifying critical input regions and code in applications , 2010, ISSTA '10.

[2]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Nicholas P. Carter,et al.  Design techniques for cross-layer resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[4]  山川 烈,et al.  Soft Computing , 2000, Soft Comput..

[5]  Christof Fetzer,et al.  Slice Your Bug: Debugging Error Detection Mechanisms Using Error Injection Slicing , 2010, 2010 European Dependable Computing Conference.

[6]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[7]  Diana Franklin,et al.  Efficient fault tolerance in multi-media applications through selective instruction replication , 2008, WREFT '08.

[8]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[9]  Jason E. Fritts,et al.  MediaBench II video: expediting the next generation of video systems research , 2005, IS&T/SPIE Electronic Imaging.

[10]  Karthikeyan Sankaralingam,et al.  Relax: an architectural framework for software recovery of hardware faults , 2010, ISCA.

[11]  Douglas L. Jones,et al.  Scalable stochastic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[12]  David W. Binkley,et al.  Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[13]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[14]  Donald Yeung,et al.  Application-Level Correctness and its Impact on Fault Tolerance , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[16]  Sarita V. Adve,et al.  Low-cost program-level detectors for reducing silent data corruptions , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[17]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[18]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[19]  Frederic T. Chong,et al.  Characterization of Error-Tolerant Applications when Protecting Control Data , 2006, 2006 IEEE International Symposium on Workload Characterization.

[20]  Song Liu,et al.  Flikker: saving DRAM refresh-power through critical data partitioning , 2011, ASPLOS XVI.