A Methodology to Assess Output Vulnerability Factors for Detecting Silent Data Corruption

As process technology scales, electronic devices become more susceptible to soft error induced by radiation. Silent data corruption (SDC) is considered the most severe outcome incurred by soft error. The effects of faulty variables on producing SDC vary widely. Without a profiling of vulnerability of variables, the derived detectors often incur low SDC detection rate or unacceptable overhead. To assess the vulnerability of variables to SDC, this paper proposes a metric called Output Vulnerability Factor (OVF). The metric is used to rank the variable’s priority in the detector derivation process in order to selectively protect the most SDC-prone variable in the program. The calculation of OVF is based on enhanced Dynamic Dependence Graph (eDDG), a proposed instruction-level error propagation model. We filter out the edges representing the identified crash propagation path and perform a backward traversal of the eDDG to obtain SDC propagation path. Further, error masking probability is estimated for the edges refer to value comparison and logistic operation. Fault injections show that our approach achieves an SDC detection rate of 65.0% with the top 10% high OVF variables monitored. Compared with previous methods, the SDC detection rate increases by 12-21%.

[1]  Ravishankar K. Iyer,et al.  Dynamic Derivation of Application-Specific Error Detectors and their Implementation in Hardware , 2006, 2006 Sixth European Dependable Computing Conference.

[2]  Ashutosh Sharma,et al.  Computation of the Reliable and Quickest Data Path for Healthcare Services by Using Service-Level Agreements and Energy Constraints , 2019, Arabian Journal for Science and Engineering.

[3]  Shubhendu S. Mukherjee,et al.  Perturbation-based Fault Screening , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[4]  Ravishankar K. Iyer,et al.  Application-based metrics for strategic placement of detectors , 2005, 11th Pacific Rim International Symposium on Dependable Computing (PRDC'05).

[5]  Hossein Asadi,et al.  Dependability Analysis of Data Storage Systems in Presence of Soft Errors , 2019, IEEE Transactions on Reliability.

[6]  Muhammad Shafique,et al.  A Fine-Grained Soft Error Resilient Architecture under Power Considerations , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Sanjay J. Patel,et al.  Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Hemraj Saini,et al.  A Secure, Energy- and SLA-Efficient (SESE) E-Healthcare Framework for Quickest Data Transmission Using Cyber-Physical System , 2019, Sensors.

[9]  Karthik Pattabiraman,et al.  LLFI: An Intermediate Code-Level Fault Injection Tool for Hardware Faults , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[10]  Yun Wang,et al.  Characterization of stack behavior under soft errors , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Sarita V. Adve,et al.  Low-cost program-level detectors for reducing silent data corruptions , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[13]  Karthik Pattabiraman,et al.  Modeling Soft-Error Propagation in Programs , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[14]  Sarita V. Adve,et al.  Relyzer: exploiting application-level fault equivalence to analyze application resiliency to transient faults , 2012, ASPLOS XVII.

[15]  Alan Mycroft,et al.  Redux: A Dynamic Dataflow Tracer , 2003, RV@CAV.

[16]  Karthik Pattabiraman,et al.  Fine-Grained Characterization of Faults Causing Long Latency Crashes in Programs , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[17]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[18]  Ravishankar K. Iyer,et al.  Automated Derivation of Application-Aware Error Detectors Using Static Analysis: The Trusted Illiac Approach , 2011, IEEE Transactions on Dependable and Secure Computing.

[19]  David R. Kaeli,et al.  Eliminating microarchitectural dependency from Architectural Vulnerability , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[20]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[21]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[22]  Alfredo Benso,et al.  Data criticality estimation in software applications , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[23]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.

[24]  Vasileios Porpodas,et al.  DRIFT: Decoupled CompileR-Based Instruction-Level Fault-Tolerance , 2013, LCPC.

[25]  Xin Xu,et al.  Understanding soft error propagation using Efficient vulnerability-driven fault injection , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).