Robust Duplication With Comparison Methods in Microcontrollers

Commercial microprocessors could be useful computational platforms in space systems, as long as the risk is bound. Many spacecraft are computationally constrained because all of the computation is done on a single radiation-hardened microprocessor. It is possible that a commercial microprocessor could be used for configuration, monitoring and background tasks that are not mission critical. Most commercial microprocessors are affected by radiation, including single-event effects (SEEs) that could be destructive to the component or corrupt the data. Part screening can help designers avoid components with destructive failure modes, and mitigation can suppress data corruption. We have been experimenting with a method for masking radiation-induced faults through the software executing on the microprocessor. While triple-modular redundancy (TMR) techniques are very effective at masking faults in software, the increased amount of execution time to complete the computation is not desirable. In this paper we present a technique for combining duplication with compare (DWC) with TMR that decreases observable errors by as much as 145 times with only a 2.35 time decrease in performance.

[1]  J. Neumann Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[2]  Rogério de Lemos,et al.  Robustness-Driven Resilience Evaluation of Self-Adaptive Software Systems , 2017, IEEE Transactions on Dependable and Secure Computing.

[3]  Rakesh Kumar,et al.  Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[4]  Sergio Cuenca-Asensi,et al.  Hybrid soft error mitigation techniques for COTS processor-based systems , 2016, 2016 17th Latin-American Test Symposium (LATS).

[5]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[6]  Shanghai Jiao,et al.  The Construction of a Williams Design and Randomization in Cross-Over Clinical Trials , 2009 .

[7]  Eduardo Chielle,et al.  S-SETA: Selective Software-Only Error-Detection Technique Using Assertions , 2015, IEEE Transactions on Nuclear Science.

[8]  Heather Quinn,et al.  Single-Event Effects in Low-Cost, Low-Power Microprocessors , 2014, 2014 IEEE Radiation Effects Data Workshop (REDW).

[9]  Thomas Hérault,et al.  Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.

[10]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[11]  Steven M. Guertin,et al.  Using Benchmarks for Radiation Testing of Microprocessors and FPGAs , 2015, IEEE Transactions on Nuclear Science.

[12]  Heather Quinn,et al.  Software Resilience and the Effectiveness of Software Mitigation in Microcontrollers , 2015, IEEE Transactions on Nuclear Science.

[13]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[14]  R. Velazco,et al.  Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors , 2000 .

[15]  Fernanda Lima Kastensmidt,et al.  The limitations of software signature and basic block sizing in soft error fault coverage , 2010, 2010 11th Latin American Test Workshop.

[16]  John Sartori,et al.  Automated Algorithmic Error Resilience Based on Outlier Detection , 2016, IEEE Micro.

[17]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..