A Partial Reconfiguration-based scheme to mitigate Multiple-Bit Upsets for FPGAs in low-cost space applications

Conventionally, the design of fault-tolerant architectures for space applications has mainly been focused on reliability and correction latency. However, based on cost reduction requirements, the power consumption must also be minimized, as it impacts the battery size and the weight of the satellite. While technology scaling helps in this purpose, it also increases the circuit sensitivity to Multiple-Bit Upsets (MBU), thus specific design techniques must be applied to compensate this effect. In order to leverage high-performance and low-cost Commercial Off-The-Shelf (COTS) FPGAs in space applications, this work tackles fault tolerance along three abstraction levels: circuit, organization and control. At the circuit level, a new ultra-low overhead Forward Temporal Redundancy (FTR) scheme is proposed for error detection in user logic. At the organization level in the FPGA, this work leverages the opportunities brought by frame- and module-based Dynamic Partial Reconfiguration (DPR) to handle configuration memory errors. At the control level, this work fully exploits the modern Xilinx Zynq System-on-Chip FPGA which embeds a hard processor used for circuit state preservation with checkpointing and rollback. The overall topology is successfully validated with 99.998% reliability through fault-injection for a five-stage pipelined MIPS processor at a global resource overhead of only 85% in LUTs and 125% in flip-flops.

[1]  Marek Gorgon,et al.  PixelStreams-based implementation of videodetector , 2007 .

[2]  Mike Peattie Two Flows for Partial Reconfiguration: Module Based or Small Bit Manipulations , 2000 .

[3]  Bertrand Granado,et al.  Context-aware resources placement for SRAM-based FPGA to minimize checkpoint/recovery overhead , 2014, 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14).

[4]  B. Narasimham,et al.  Characterization of Digital Single Event Transient Pulse-Widths in 130-nm and 90-nm CMOS Technologies , 2007, IEEE Transactions on Nuclear Science.

[5]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[6]  Sébastien Pillement,et al.  Low-overhead fault-tolerance technique for a dynamically reconfigurable softcore processor , 2013, IEEE Transactions on Computers.

[7]  J. Barth,et al.  Model for Cumulative Solar Heavy Ion Energy and Linear Energy Transfer Spectra , 2007, IEEE Transactions on Nuclear Science.

[8]  A. Lesea,et al.  Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and Analysis , 2008, IEEE Transactions on Nuclear Science.

[9]  A. Lesea,et al.  Effectiveness of internal vs. external SEU scrubbing mitigation strategies in a Xilinx FPGA: Design, test, and analysis , 2007, 2007 9th European Conference on Radiation and Its Effects on Components and Systems.

[10]  Edward J. McCluskey,et al.  Software-implemented EDAC protection against SEUs , 2000, IEEE Trans. Reliab..

[11]  Marco D. Santambrogio,et al.  Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  M. Angelin Ponrani,et al.  Module based Partial Reconfiguration on Bitstream Relocation Filter , 2013 .

[13]  Marco D. Santambrogio,et al.  TMR and Partial Dynamic Reconfiguration to mitigate SEU faults in FPGAs , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).

[14]  Apostolos Dollas,et al.  Combining Duplication, Partial Reconfiguration and Software for On-line Error Diagnosis and Recovery in SRAM-Based FPGAs , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[15]  Soft Error Mitigation Using Prioritized Essential Bits , 2012 .

[16]  P. Graham,et al.  Radiation-induced multi-bit upsets in SRAM-based FPGAs , 2005, IEEE Transactions on Nuclear Science.

[17]  Wayne Luk,et al.  Enhancing Relocatability of Partial Bitstreams for Run-Time Reconfiguration , 2007, 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007).

[18]  Maya Gokhale,et al.  Dynamic reconfiguration for management of radiation-induced faults in FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[19]  Walter Stechele,et al.  Towards Rapid Dynamic Partial Reconfiguration in Video-Based Driver Assistance Systems , 2010, ARC.

[20]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[21]  Marco D. Santambrogio,et al.  SEU mitigation for sram-based fpgas through dynamic partial reconfiguration , 2007, GLSVLSI '07.

[22]  Luigi Carro,et al.  Fault-Tolerance Techniques for SRAM-Based FPGAs (Frontiers in Electronic Testing) , 2006 .

[23]  Martin Straka,et al.  Fault Tolerant Structure for SRAM-Based FPGA via Partial Dynamic Reconfiguration , 2010, DSD 2010.

[24]  Dhiraj K. Pradhan,et al.  Roll-forward and rollback recovery: performance-reliability trade-off , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.