A scrubbing scheduling approach for reliable FPGA multicore processors with real-time constraints

Typical fault tolerance techniques for FPGA processors against soft errors combine h/w redundancy for fault detection along with checkpointing/rollback for fault recovery and scrubbing for fault repair. However, to avoid the overheads imposed by redundancy schemes, the readback scrubbing can be used as a standalone solution for both fault detection and repair. Since checkpointing and scrubbing affect the execution time of system tasks, the temporal robustness of systems with real-time constraints protected by these two mechanisms must be addressed. In this paper, we study for first time the scheduling of scrubbing task in multicore processors, given that the scrubbing task consists of several jobs each one checking the partial configuration memory occupied by a specific core. We assume real-time multitask applications executed by a multicore processor using the non-preemptive Early Deadline First (EDF) algorithm and propose a scrubbing scheduling approach, based on a modified version of the EDF algorithm, that improves the real-time system tolerance against transient faults. We demonstrate the efficiency of the proposed approach running a large number of simulations with random task sets on a dual and a quad-core processor.

[1]  Akash Kumar,et al.  Scrubbing Mechanism for Heterogeneous Applications in Reconfigurable Devices , 2017, ACM Trans. Design Autom. Electr. Syst..

[2]  Rolf Ernst,et al.  Reliability analysis for MPSoCs with mixed-critical, hard real-time constraints , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[3]  Mihalis Psarakis,et al.  Combining checkpointing and scrubbing in FPGA-based real-time systems , 2013, 2013 IEEE 31st VLSI Test Symposium (VTS).

[4]  Akash Kumar,et al.  Multi-directional error correction schemes for SRAM-based FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[5]  Akash Kumar,et al.  Dynamically adaptive scrubbing mechanism for improved reliability in reconfigurable embedded systems , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Sarita V. Adve,et al.  Architectures for online error detection and recovery in multicore processors , 2011, 2011 Design, Automation & Test in Europe.

[7]  Lesley Shannon,et al.  Shared Memory Multicore MicroBlaze System with SMP Linux Support , 2016, ACM Trans. Reconfigurable Technol. Syst..

[8]  David Lee,et al.  SEU Mitigation and Validation of the LEON3 Soft Processor Using Triple Modular Redundancy for Space Processing , 2016, FPGA.

[9]  Hamid R. Zarandi,et al.  DFTS: A dynamic fault-tolerant scheduling for real-time tasks in multicore processors , 2014, Microprocess. Microsystems.

[10]  Gabriel L. Nazar,et al.  Fine-Grained Fast Field-Programmable Gate Array Scrubbing , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  L. Carro,et al.  New Techniques for Improving the Performance of the Lockstep Architecture for SEEs Mitigation in FPGA Embedded Processors , 2009, IEEE Transactions on Nuclear Science.

[12]  Mihalis Psarakis,et al.  Scrubbing-based SEU mitigation approach for Systems-on-Programmable-Chips , 2011, 2011 International Conference on Field-Programmable Technology.

[13]  Gabriel L. Nazar Improving FPGA repair under real-time constraints , 2015, Microelectron. Reliab..