Self-Adaptation for Availability in CPU-FPGA Systems Under Soft Errors

We introduce a model-based reliability estimation to preserve application availability in CPU-FPGA systems exposed to soft errors under varying environment conditions. The estimation is used as an in-system method to select a suitable configuration for changing radiation conditions. This allows systems to autonomously adapt their configuration in order to balance between reliability and performance. Such a self-adaptation goes beyond the state-of-the-art, where adaptation relies on preplanned reactive mode changes. By autonomously evaluating new configurations, our self-adaptation process is capable of increasing the availability by selecting the configuration with the desired application reliabilities for the current environment conditions.

[1]  Harald Michalik,et al.  SEU fault classification by fault injection for an FPGA in the space instrument SOPHI , 2017, 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[2]  Rolf Ernst,et al.  System level performance analysis - the SymTA/S approach , 2005 .

[3]  Alberto L. Sangiovanni-Vincentelli,et al.  Fault-tolerant platforms for automotive safety-critical applications , 2003, CASES '03.

[4]  Puneet Gupta,et al.  CyberPhysical-System-On-Chip (CPSoC): A self-aware MPSoC paradigm with cross-layer virtual sensing and actuation , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[5]  Nicholas Nethercote,et al.  Dynamic Binary Analysis and Instrumentation , 2004 .

[6]  Marco Platzner,et al.  Self-aware Computing: Introduction and Motivation , 2016, Self-aware Computing Systems.

[7]  Mehran Amrbar,et al.  Heavy Ion Single Event Effects Measurements of Xilinx Zynq-7000 FPGA , 2015, 2015 IEEE Radiation Effects Data Workshop (REDW).

[8]  Fernanda Gusmão de Lima Kastensmidt,et al.  Method to Analyze the Susceptibility of HLS Designs in SRAM-Based FPGAs Under Soft Errors , 2016, ARC.

[9]  Kevin Skadron,et al.  Evaluating Overheads of Multibit Soft-Error Protection in the Processor Core , 2013, IEEE Micro.

[10]  Chaohui He,et al.  Microbeam Heavy-Ion Single-Event Effect on Xilinx 28-nm System on Chip , 2018, IEEE Transactions on Nuclear Science.

[11]  Donald E. Thomas,et al.  Cost-effective lifetime and yield optimization for NoC-based MPSoCs , 2014, TODE.

[12]  Samuel Kounev,et al.  Model-driven Algorithms and Architectures for Self-Aware Computing Systems (Dagstuhl Seminar 15041) , 2015, Dagstuhl Reports.

[13]  Andy D. Pimentel,et al.  A SAFE approach towards early design space exploration of fault-tolerant multimedia MPSoCs , 2012, CODES+ISSS.

[14]  Wenhai Li,et al.  A Self-Adaptive SEU Mitigation System for FPGAs with an Internal Block RAM Radiation Particle Sensor , 2013, FCCM 2013.

[15]  Rolf Ernst,et al.  Towards model-based integration of component-based automotive software systems , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[16]  Jörg Henkel,et al.  GUARD: GUAranteed reliability in dynamically reconfigurable systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Sándor P. Fekete,et al.  Hardware and Software Task Scheduling for ARM-FPGA Platforms , 2018, 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[18]  Norbert Wehn,et al.  Resilience Articulation Point (RAP): Cross-layer dependability modeling for nanometer system-on-chip resilience , 2014, Microelectron. Reliab..