Adaptive-Hybrid Redundancy with Error Injection

Adaptive-Hybrid Redundancy (AHR) shows promise as a method to allow flexibility when selecting between processing speed and energy efficiency while maintaining a level of error mitigation in space radiation environments. Whereas previous work demonstrated AHR’s feasibility in an error free environment, this work analyzes AHR performance in the presence of errors. Errors are deliberately injected into AHR at specific times in the processing chain to demonstrate best and worst case performance impacts. This analysis demonstrates that AHR provides flexibility in processing speed and energy efficiency in the presence of errors.

[1]  Nicolas S. Hamilton Adaptive-Hybrid Redundancy for Radiation Hardening , 2019 .

[2]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[3]  Brock J. LaMeres,et al.  A Power-Efficient Design Approach to Radiation Hardened Digital Circuitry using Dynamically Selectable Triple Modulo Redundancy , 2008 .

[4]  Luca Sterpone,et al.  Analysis and mitigation of single event effects on flash-based FPGAS , 2014, 2014 19th IEEE European Test Symposium (ETS).

[5]  Yuval Tamir Fault Tolerance for VLSI Multicomputers , 1985 .

[6]  Melvin A. Breuer,et al.  State-of-the-Art Assessment of Testing and Testability of Custom LSI/VLSI Circuits. Volume IV. Test Generation. , 1982 .

[7]  Martin Straka,et al.  Fault Tolerant Structure for SRAM-Based FPGA via Partial Dynamic Reconfiguration , 2010, DSD 2010.

[8]  Irith Pomeranz,et al.  Transient-Fault Recovery for Chip Multiprocessors , 2003, IEEE Micro.

[9]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[10]  John Kontoleon Soft error recovery in simplex and triplex memory systems , 2009, Microelectron. Reliab..

[11]  K.S. Morgan,et al.  SRAM FPGA Reliability Analysis for Harsh Radiation Environments , 2009, IEEE Transactions on Nuclear Science.

[12]  Marc Lobelle,et al.  Entirely protecting operating systems against transient errors in space environment , 2017, EDCC 2017.

[13]  Riccardo Mariani,et al.  A flexible microcontroller architecture for fail-safe and fail-operational systems , 2010 .

[14]  David Bol,et al.  A Partial Reconfiguration-based scheme to mitigate Multiple-Bit Upsets for FPGAs in low-cost space applications , 2015, 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC).

[15]  Edward J. McCluskey,et al.  Low Energy Error Detection Technique Using Procedure Call Duplication , 2001 .

[16]  Ramez M. Daoud,et al.  Fault secure FPGA-based TMR voter , 2018, 2018 7th Mediterranean Conference on Embedded Computing (MECO).

[17]  Edward J. McCluskey,et al.  Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..

[18]  F G Gray,et al.  Periodically Self Restoring Redundant Systems for VLSI Based Highly Reliable Design , 1984 .

[19]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[20]  Pedro Reviriego,et al.  Modular fault tolerant processor architecture on a SoC for space , 2018, Microelectron. Reliab..

[21]  M. Grecki SEUs tolerance in FPGAs based digital LLRF system for XFEL , 2012, 2012 18th IEEE-NPSS Real Time Conference.

[22]  Shuai Wang,et al.  Self-Adaptive Data Caches for Soft-Error Reliability , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  M A Breuer,et al.  State-of-the-Art Assessment of Testing and Testability of Custom LSI/VLSI Circuits. Volume VI. Redundancy, Testing Circuits, and Codes. , 1982 .

[24]  Shidhartha Das,et al.  A Triple Core Lock-Step (TCLS) ARM® Cortex®-R5 Processor for Safety-Critical and Ultra-Reliable Applications , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W).

[25]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[26]  T. Jayanthi,et al.  Understanding radiation effects in SRAM-based field programmable gate arrays for implementing instrumentation and control systems of nuclear power plants , 2017 .

[27]  Babak Falsafi,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[28]  C. Carmichael,et al.  A fault injection analysis of Virtex FPGA TMR design methodology , 2001, RADECS 2001. 2001 6th European Conference on Radiation and Its Effects on Components and Systems (Cat. No.01TH8605).