Investigating the Inherent Soft Error Resilience of Embedded Applications by Full-System Simulation

It has long been acknowledged that some applications feature inherent resilience against soft errors, e.g., the impact of soft errors on multimedia applications is often non-visible to humans. In this paper we investigate the inherent resilience of two typical embedded applications using a case study of a control system and a robot arm. Both studies were enabled by our mixed-mode fault injection simulator ETISS-ML, which allows RTL-accurate fault injection while being able to simulate very long scenarios, e.g. robot movements of several seconds. Our results indicate that full simulation of the embedded system and its environment are required to classify whether the system can tolerate the impact of a soft error. This is due to the fact that it is hard to predict the impact of a certain output deviation without investigating the change in the system behavior taking into account the control loop. Based on this classification method we hope to be able to exploit this resilience for lowering the cost of error detection mechanisms in future research.

[1]  Lotfi A. Zadeh,et al.  Soft computing and fuzzy logic , 1994, IEEE Software.

[2]  Massimo Violante,et al.  Software-Implemented Hardware Fault Tolerance , 2010 .

[3]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[4]  Olivier Temam,et al.  Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[5]  Norbert Seifert,et al.  Radiation-induced Soft Errors: A Chip-level Modeling Perspective , 2010, Found. Trends Electron. Des. Autom..

[6]  Ulf Schlichtmann,et al.  Performance and Accuracy in Soft-Error Resilience Evaluation using the Multi-Level Processor Simulator ETISS-ML , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Ulf Schlichtmann,et al.  ETISS-ML: A multi-level instruction set simulator with RTL-level fault injection support for the evaluation of cross-layer resiliency techniques , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  David I. August,et al.  Configurable Transient Fault Detection via Dynamic Binary Translation , 2006 .

[9]  Edward J. McCluskey,et al.  Software implemented hardware fault tolerance , 2000 .

[10]  Norbert Wehn,et al.  Resilience Articulation Point (RAP): Cross-layer dependability modeling for nanometer system-on-chip resilience , 2014, Microelectron. Reliab..

[11]  Karthik Pattabiraman,et al.  Error Detector Placement for Soft Computing Applications , 2016, TECS.

[12]  Muhammad Shafique,et al.  Leveraging variable function resilience for selective software reliability on unreliable hardware , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[14]  John C. Knight,et al.  Safety critical systems: challenges and directions , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[15]  Twan Basten,et al.  Fault-tolerant embedded control systems for unreliable hardware , 2014, 2014 International Symposium on Integrated Circuits (ISIC).

[16]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Melvin A. Breuer,et al.  Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[18]  Muhammad Shafique,et al.  Reliable Software for Unreliable Hardware - A Cross Layer Perspective , 2016 .

[19]  Eduardo Chielle,et al.  Reliability on ARM Processors Against Soft Errors Through SIHFT Techniques , 2016, IEEE Transactions on Nuclear Science.