Analyzing the impact of radiation-induced failures in flash-based APSoC with and without fault tolerance techniques at CERN environment

Abstract All Programmable System-on-Chip (APSoC) devices are designed to provide higher overall programmable flexibility and system performance at lower costs. Such characteristics make APSoCs very suitable and attractive for critical environments, such as the one encountered in the accelerators chain of the European Organization for Nuclear Research (CERN), where electronic components can be exposed to high-energy hadrons (protons, neutrons, pions), heavy ions, and other particles, at the same time. However, APSoCs may be prone to experience Single Event Effects (SEE). We investigate how the configuration of the Processing System (PS) influences the reliability of a FLASH-based APSoC. We experimentally study the differences in the radiation-induced error rate of the PS, under various configurations while executing an application. We also propose two approaches for increasing the reliability of programs running on the embedded processor. Furthermore, we analyze the sensitivity of the system taking into account not only the cross section, but also the system reliability and the Mean Workload Between Failures (MWBF). Preliminary results show that it is possible to double the performance and to increase the system reliability up to one order of magnitude by managing processor features such as cache memory usage, error correcting codes, and processor exception handlers.