Applying Compiler-Automated Software Fault Tolerance to Multiple Processor Platforms

Several recent works have explored the feasibility of using commercial off-the-shelf (COTS) processing systems in radiation-prone environments, such as spacecraft. Typically, this approach requires some form of protection to ensure that the software can tolerate radiation upsets without compromising the system. Our recent work, COmpiler Assisted Software fault Tolerance (COAST), provides automated compiler modification of software programs to insert dual- or triple-modular redundancy. In this article, we extend COAST to support several new processing platforms, including RISC-V and Xilinx, San Jose, CA, USA, SoC-based products. The automated software protection mechanisms are tested for a variety of configurations, altering the benchmark and cache configuration. Across the different configurations, the cross sections were improved by <inline-formula> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$106\times $ </tex-math></inline-formula>. In addition, a hardware-mitigation technique is tested using dual-lock-step cores on the Texas Instruments, Dallas, TX, USA, Hercules platform, which is compared with the software-only mitigation approach.

[1]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[2]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[3]  Ravishankar K. Iyer,et al.  Processor-Level Selective Replication , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[4]  Eduardo Chielle,et al.  Reliability on ARM Processors against Soft Errors by a Purely Software Approach , 2015, 2015 15th European Conference on Radiation and Its Effects on Components and Systems (RADECS).

[5]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[6]  Heather Quinn,et al.  Robust Duplication With Comparison Methods in Microcontrollers , 2017, IEEE Transactions on Nuclear Science.

[7]  Eduardo Chielle,et al.  Overhead Reduction in Data-Flow Software-Based Fault Tolerance Techniques , 2016 .

[8]  Christof Fetzer,et al.  AN-Encoding Compiler: Building Safety-Critical Systems with Commodity Hardware , 2009, SAFECOMP.

[9]  Eduardo Chielle,et al.  Configurable tool to protect processors against SEE by software-based detection techniques , 2012, 2012 13th Latin American Test Workshop (LATW).

[10]  David I. August,et al.  Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Heather Quinn,et al.  Software Resilience and the Effectiveness of Software Mitigation in Microcontrollers , 2015, IEEE Transactions on Nuclear Science.

[12]  Scott A. Mahlke,et al.  Efficient soft error protection for commodity embedded microprocessors using profile information , 2012, LCTES '12.

[13]  Cheng Wang,et al.  Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[14]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[15]  Jacob A. Abraham,et al.  ACCE: Automatic correction of control-flow errors , 2007, 2007 IEEE International Test Conference.

[16]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[17]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[18]  Heather Quinn,et al.  Microcontroller Compiler-Assisted Software Fault Tolerance , 2019, IEEE Transactions on Nuclear Science.

[19]  Eduardo Chielle,et al.  S-SETA: Selective Software-Only Error-Detection Technique Using Assertions , 2015, IEEE Transactions on Nuclear Science.

[20]  Aviral Shrivastava,et al.  nZDC: A compiler technique for near Zero Silent Data Corruption , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Aviral Shrivastava,et al.  Quantitative analysis of Control Flow Checking mechanisms for soft errors , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Eduardo Chielle,et al.  Reliability on ARM Processors Against Soft Errors Through SIHFT Techniques , 2016, IEEE Transactions on Nuclear Science.