Selectively Fortifying Reconfigurable Computing Device to Achieve Higher Error Resilience

With the advent of 10 nm CMOS devices and "exotic" nanodevices, the location and occurrence time of hardware defects and design faults become increasingly unpredictable, therefore posing severe challenges to existing techniques for error-resilient computing because most of them statically assign hardware redundancy and do not account for the error tolerance inherently existing in many mission-critical applications. This work proposes a novel approach to selectively fortifying a target reconfigurable computing device in order to achieve hardware-efficient error resilience for a specific target application. We intend to demonstrate that such error resilience can be significantly improved with effective hardware support. The major contributions of this work include (1) the development of a complete methodology to performsensitivity and criticality analysis of hardware redundancy, (2) a novel problem formulation and an efficient heuristic methodology to selectively allocate hardware redundancy among a target design's key components in order to maximize its overall error resilience, and (3) an academic prototype of SFC computing device that illustrates a 4 times improvement of error resilience for a H.264 encoder implemented with an FPGA device.

[1]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[2]  Naresh R. Shanbhag,et al.  Energy-efficient signal processing via algorithmic noise-tolerance , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[3]  Kaushik Roy,et al.  Significance driven computation: a voltage-scalable, variation-aware, quality-tuning motion estimator , 2009, ISLPED.

[4]  Warren Robinett,et al.  Computing with a trillion crummy components , 2007, CACM.

[5]  M. Caffrey,et al.  Evaluating TMR Techniques in the Presence of Single Event Upsets , 2003 .

[6]  R. Brown,et al.  Radiation hardened COTS-based 32-bit microprocessor , 1999, 1999 Fifth European Conference on Radiation and Its Effects on Components and Systems. RADECS 99 (Cat. No.99TH8471).

[7]  Yu Cao,et al.  A resilience roadmap , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[8]  Ravi Nair Models for energy-efficient approximate computing , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[9]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[10]  C. Carmichael,et al.  Dynamic testing of Xilinx Virtex-II field programmable gate array (FPGA) input/output blocks (IOBs) , 2004, IEEE Transactions on Nuclear Science.

[11]  Rainer Leupers,et al.  A Fast and Flexible Platform for Fault Injection and Evaluation in Verilog-Based Simulations , 2009, 2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement.

[12]  Subhasish Mitra,et al.  ERSA: Error Resilient System Architecture for probabilistic applications , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[13]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[14]  Christof Ebert,et al.  Fuzzy classification for software criticality analysis , 1996 .

[15]  Régis Leveugle,et al.  System-level dependability analysis with RT-level fault injection accuracy , 2004, 19th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2004. DFT 2004. Proceedings..

[16]  Youn-Long Lin,et al.  VLSI Design for Video Coding: H.264/AVC Encoding from Standard Specification to Chip , 2009 .

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  William Heidergott,et al.  SEU tolerant device, circuit and processor design , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[19]  Jean Arlat,et al.  Fault Injection for Dependability Validation: A Methodology and Some Applications , 1990, IEEE Trans. Software Eng..

[20]  M. Wirthlin,et al.  Fine-Grain SEU Mitigation for FPGAs Using Partial TMR , 2008, IEEE Transactions on Nuclear Science.

[21]  Algirdas Avizienis,et al.  Reliability analysis and architecture of a hybrid-redundant digital system: generalized triple modular redundancy with self-repair , 1970, AFIPS '70 (Spring).

[22]  Peter G. Bishop,et al.  Software criticality analysis of COTS/SOUP , 2003, Reliab. Eng. Syst. Saf..

[23]  Wayne H. Wolf,et al.  MediaBench II video: Expediting the next generation of video systems research , 2009, Microprocess. Microsystems.

[24]  Sani R. Nassif,et al.  A resilience roadmap: (invited paper) , 2010, DATE 2010.

[25]  Douglas L. Jones,et al.  Computation as estimation: Estimation-theoretic IC design improves robustness and reduces power consumption , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Nikolaos G. Bourbakis,et al.  Emulating human visual perception for measuring difference in images using an SPN graph approach , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[27]  M. Wirthlin,et al.  SEU-induced persistent error propagation in FPGAs , 2005, IEEE Transactions on Nuclear Science.

[28]  Jye-Chyi Lu,et al.  A Review of Reliability Research on Nanotechnology , 2007, IEEE Transactions on Reliability.

[29]  Melvin A. Breuer,et al.  Multi-media applications and imprecise computation , 2005, 8th Euromicro Conference on Digital System Design (DSD'05).

[31]  Paul Anderson,et al.  Design and Implementation of a Fine-Grained Software Inspection Tool , 2003, IEEE Trans. Software Eng..

[32]  George B. Dantzig,et al.  Linear Programming 1: Introduction , 1997 .

[33]  Pradip Bose Designing reliable systems with unreliable components , 2006, IEEE Micro.

[34]  Jiri Gaisler A portable and fault-tolerant microprocessor based on the SPARC v8 architecture , 2002, Proceedings International Conference on Dependable Systems and Networks.

[35]  S. Katkoori,et al.  Selective triple Modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs , 2004, IEEE Transactions on Nuclear Science.

[36]  P. Graham,et al.  Radiation-induced multi-bit upsets in SRAM-based FPGAs , 2005, IEEE Transactions on Nuclear Science.