Mitigating Soft Errors in Processors Cores Embedded in System-on Programmable-Chips

Newer generations of Field Programmable Gate Arrays (FPGAs) embed advanced intellectual property (IP) cores, such as fast digital signal processors (DSPs), memory blocks, and processors, which are implemented in dedicated parts of the silicon, without consuming reconfigurable fabric that is left available for system designers. The new class of devices combining firm computing cores along with programmable fabric is often referred to as system-on-programmable-chip (SoPC). Several application domains, like industrial control and automotive, where computing intensive algorithms have to be performed in real-time by embedded processors, already recognized the benefit of SoPCs. Space application domain may benefit as well from SoPCs, provided that the problems specific to such application domain are solved. In particular, being the SoPC devised for ground-based applications, the consequences of the interaction of ionizing radiations with SoPC silicon, triggering effects like Total Ionizing Dose (TID) or Single Event Effects (SEE), are of particular interest for designers willing to deploy SoPC in space. This chapter first summarizes the effects of radiation in SoPC with particular emphasis on SEE in the processor cores the device embeds. Then, it reports an overview of the techniques to cope with them, looking in particular to Software Implementer Fault Tolerance (SIFT) techniques. Finally, a novel architecture is proposed.

[1]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[2]  Marco Torchiano,et al.  A source-to-source compiler for generating dependable software , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[3]  Heinz Kantz,et al.  The ELEKTRA railway signalling system: field experience with an actively replicated system with diversity , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[5]  Edward J. McCluskey,et al.  Error detection by selective procedure call duplication for low energy consumption , 2002, IEEE Trans. Reliab..

[6]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[7]  Pascal Traverse,et al.  AIRBUS A320/A330/A340 electrical flight controls - A family of fault-tolerant systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[8]  Massimo Violante,et al.  Software-Implemented Hardware Fault Tolerance , 2010 .

[9]  Tyson S. Hall,et al.  Using System-on-a-Programmable-Chip Technology to Design Embedded Systems , 2006, Int. J. Comput. Their Appl..

[10]  Klaus Echtle,et al.  Hardware fault detection by diverse software , 1992 .

[11]  S. Saib Distributed architectures for reliability in computer systems , 1979 .

[12]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[13]  C. E. Price Fault tolerant avionics for the Space Shuttle , 1991, IEEE/AIAA 10th Digital Avionics Systems Conference.

[14]  Edward J. McCluskey,et al.  Concurrent System-Level Error Detection Using a Watchdog Processor , 1985, ITC.

[15]  Heidrun Engel,et al.  Data flow transformations to detect results which are corrupted by hardware faults , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[16]  R. Velazco,et al.  Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors , 2000 .

[17]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[18]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[19]  T. P. Ma,et al.  Ionizing radiation effects in MOS devices and circuits , 1989 .

[20]  L. Impagliazzo,et al.  Architecture and safety requirements of the ACC railway interlocking system , 1996, Proceedings of IEEE International Computer Performance and Dependability Symposium.

[21]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[22]  Alfredo Benso,et al.  A C/C++ source-to-source compiler for dependable applications , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[23]  Brian Randell System structure for software fault tolerance , 1975 .

[24]  Fabian Vargas,et al.  A new hybrid fault detection technique for systems-on-a-chip , 2006, IEEE Transactions on Computers.

[25]  Jacob A. Abraham,et al.  Fault-Tolerant FFT Networks , 1988, IEEE Trans. Computers.

[26]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[27]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[28]  Ronald Riter,et al.  Modeling and testing a critical fault-tolerant multi-process system , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[29]  M. Violante,et al.  Improved software-based processor control-flow errors detection technique , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[30]  A. Avizienis,et al.  Dependable computing: From concepts to design diversity , 1986, Proceedings of the IEEE.

[31]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).