Maximizing transient availability of real-time Onboard Reconfigurable Processing Platforms: An analytical redundancy inspired approach

Onboard Reconfigurable Processing Platform (ORPP), which mainly consists of reconfigurable devices (FPGAs) and auxiliary co-processors such as DSPs, is dedicated to in-situ real-time computing for various space missions. Harsh ionizing radiation effects have been observed during flight thus make it crucial to design fault tolerance in ORPPs. Transient available refers to ORPP can mask transient faults such as Single Event Upset (SEU), Single Event Transient (SET) in their circuits therefore maintain the correctness as well as timeliness of its outputs. Redundancy at different levels is useful means to avoid transient faults and signal an error indicator for later fault recovery process. Among them the chip-level redundancy method is often regarded as coarse-grained and expensive in terms of area and cost. Whereas fine-grained redundancy approaches including Triple Module Redundancy (TMR) in FPGA suffer a lot in mapping user modules in diversity since adjacent placement of duplicated modules is lack of immunity to single particle induced multiple bits upset (MBU). We reviewed the chip-level method and propose an analytical redundancy inspired fault tolerant scheme which uses the spare computing resource of co-processors to generate two off-chip redundancy modules for majority voting in the voter. In contrast to the conventional hardware redundancy methods, the proposed approach takes the advantage of different radiation behaviors in FPGAs and DSPs, therefore has the maximum transient availability.

[1]  M. Nicolaidis,et al.  Design for soft error mitigation , 2005, IEEE Transactions on Device and Materials Reliability.

[2]  M.B. Tahoori,et al.  Soft Error Susceptibility Analysis of SRAM-Based FPGAs in High-Performance Information Systems , 2007, IEEE Transactions on Nuclear Science.

[3]  P. Graham,et al.  Radiation-induced multi-bit upsets in SRAM-based FPGAs , 2005, IEEE Transactions on Nuclear Science.

[4]  M. Wirthlin,et al.  Improving FPGA Design Robustness with Partial TMR , 2006, 2006 IEEE International Reliability Physics Symposium Proceedings.

[5]  C.K. Kouba,et al.  Single-Event Upset and Scaling Trends in New Generation of the Commercial SOI PowerPC Microprocessors , 2006, IEEE Transactions on Nuclear Science.

[6]  R. Harboe-Sorensen,et al.  Multiple-Bit Upset Analysis in 90 nm SRAMs: Heavy Ions Testing and 3D Simulations , 2007, IEEE Transactions on Nuclear Science.

[7]  Kaijie Wu,et al.  Error Correction On-Demand: A Low Power Register Transfer Level Concurrent Error Correction Technique , 2007, IEEE Transactions on Computers.

[8]  Nur A. Touba,et al.  Multiple Bit Upset Tolerant Memory Using a Selective Cycle Avoidance Based SEC-DED-DAEC Code , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[9]  Jih-Jong Wang,et al.  Single event upset and hardening in 0.15 /spl mu/m antifuse-based field programmable gate array , 2003 .

[10]  R. Ecoffet,et al.  Observations Of Single-event Upset And Multiple-bit Upset In Non-hardened High-density SRAMs In The TOPEX/ Poseidon Orbit , 1993, 1993 IEEE Radiation Effects Data Workshop.

[11]  P. Oldiges,et al.  Single-Event-Upset Critical Charge Measurements and Modeling of 65 nm Silicon-on-Insulator Latches and Memory Cells , 2006, IEEE Transactions on Nuclear Science.

[12]  Massimo Violante,et al.  Multiple errors produced by single upsets in FPGA configuration memory: a possible solution , 2005, European Test Symposium (ETS'05).

[13]  Luigi Carro,et al.  Designing fault tolerant systems into SRAM-based FPGAs , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).