Protecting SRAM-based FPGAs against multiple bit upsets using erasure codes

Multiple bit upsets due to radiation-induced soft errors are a major concern in nanoscale technology nodes. Once such errors occur in the configuration frames of an FPGA device, they permanently affect the functionality of the mapped design. The combination of error correction schemes and configuration scrubbing is an efficient approach to avoid such permanent errors. Existing solutions exploit coding techniques with considerably high overhead to protect configuration frames against multiple bit upsets. In this paper, we propose a generic scrubbing scheme which reconstructs the erroneous configuration frame based on the concept of erasure codes. Our proposed scheme does not require any changes to the FPGA architecture. Experimental results on a Xilinx Virtex-6 FPGA device show that the proposed scheme achieves error recovery coverage of 99.30% with only 3% resource occupation while the mean time to repair is comparable with previous schemes.

[1]  Seyed Ghassem Miremadi,et al.  ScTMR: A scan chain-based error recovery technique for TMR systems in safety-critical applications , 2011, 2011 Design, Automation & Test in Europe.

[2]  Edwin K. P. Chong,et al.  Efficient multicast stream authentication using erasure codes , 2003, TSEC.

[3]  Alan Wood,et al.  The impact of new technology on soft error rates , 2011, 2011 International Reliability Physics Symposium.

[4]  Dan Alexandrescu,et al.  A Practical Approach to Single Event Transient Analysis for Highly Complex Design , 2013, J. Electron. Test..

[5]  Francis G. Wolff,et al.  Interactive presentation: A new asymmetric SRAM cell to reduce soft errors and leakage power in FPGA , 2007 .

[6]  Mahdi Fazeli,et al.  Low-Cost Scan-Chain-Based Technique to Recover Multiple Errors in TMR Systems , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  K. Chapman SEU Strategies for Virtex-5 Devices , 2010 .

[8]  K P ChongEdwin,et al.  Efficient multicast stream authentication using erasure codes , 2003 .

[9]  H. Puchner,et al.  Investigation of multi-bit upsets in a 150 nm technology SRAM device , 2005, IEEE Transactions on Nuclear Science.

[10]  James S. Plank,et al.  A practical analysis of low-density parity-check erasure codes for wide-area storage applications , 2004, International Conference on Dependable Systems and Networks, 2004.

[11]  Robert H. Morelos-Zaragoza,et al.  The Art of Error Correcting Coding: Morelos-Zaragoza/The Art of Error Correcting Coding, Second Edition , 2006 .

[12]  Dhiraj K. Pradhan,et al.  Roll-forward and rollback recovery: performance-reliability trade-off , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[13]  Christos A. Papachristou,et al.  A New Asymmetric SRAM Cell to Reduce Soft Errors and Leakage Power in FPGA , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[14]  R. Morelos-Zaragoza The art of error correcting coding , 2002 .

[15]  Carl Carmichael,et al.  Triple Module Redundancy Design Techniques for Virtex FPGAs, Application Note 197 , 2001 .

[16]  Marco Lanuzza,et al.  A self-hosting configuration management system to mitigate the impact of Radiation-Induced Multi-Bit Upsets in SRAM-based FPGAs , 2010, 2010 IEEE International Symposium on Industrial Electronics.

[17]  Mark Anders,et al.  Near-threshold voltage (NTV) design — Opportunities and challenges , 2012, DAC Design Automation Conference 2012.

[18]  Mahmut T. Kandemir,et al.  Improving soft-error tolerance of FPGA configuration bits , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[19]  M. S. Won Meeting the Performance and Power Imperative of the Zettabyte Era with Generation 10 FPGA , 2010 .

[20]  E. Ibe,et al.  Impact of Scaling on Neutron-Induced Soft Error in SRAMs From a 250 nm to a 22 nm Design Rule , 2010, IEEE Transactions on Electron Devices.

[21]  Kaushik Roy,et al.  Soft-Error-Resilient FPGAs Using Built-In 2-D Hamming Product Code , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Luigi Rizzo,et al.  Effective erasure codes for reliable computer communication protocols , 1997, CCRV.

[23]  Mihalis Psarakis,et al.  Combining checkpointing and scrubbing in FPGA-based real-time systems , 2013, 2013 IEEE 31st VLSI Test Symposium (VTS).

[24]  Enhancing Robust SEU Mitigation with 28-nm FPGAs , 2010 .

[25]  A. Lesea,et al.  Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and Analysis , 2008, IEEE Transactions on Nuclear Science.

[26]  Mehdi Baradaran Tahoori,et al.  A layout-based approach for Multiple Event Transient analysis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[27]  Patrick Dorsey Xilinx Stacked Silicon Interconnect Technology Delivers Breakthrough FPGA Capacity, Bandwidth, and Power Efficiency , 2010 .