Program-Invariant Checking for Soft-Error Detection using Reconfigurable Hardware

There is an increasing concern about transient errors in deep submicron processor architectures. Software-only error detection approaches that exploit program invariants for silent error detection incur large execution overheads and are unreliable as state can be corrupted after invariant checkpoints. In this article, we explore the use of configurable hardware structures for the continuous evaluation of high-level program invariants at the assembly level. We evaluate the resource requirements and performance of the proposed predicate-evaluation hardware structures when integrated with a 32-bit MIPS soft core on a contemporary reconfigurable hardware device. The results, for a small set of kernel codes, reveal that these hardware structures require a very small number of hardware resources with negligible impact on the processor core that they are integrated in. Moreover, the amount of resources is fairly insensitive to the complexity of the invariants, thus making the proposed structures an attractive alternative to software-only predicate checking.

[1]  Jacob A. Abraham,et al.  CEDA: Control-Flow Error Detection Using Assertions , 2011, IEEE Transactions on Computers.

[2]  Mehdi Baradaran Tahoori,et al.  Soft error rate estimation and mitigation for SRAM-based FPGAs , 2005, FPGA '05.

[3]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Jürgen Becker,et al.  Using dynamic partial reconfiguration to detect sees in microprocessors through non-intrusive hybrid technique , 2011, SBCCI '11.

[5]  Alessandro Forin,et al.  The Design and Implementation of P2V, An Architecture for Zero-Overhead Online Verification of Software Programs , 2007 .

[6]  Edward J. McCluskey,et al.  Permanent fault repair for FPGAs with limited redundant area , 2001, Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[7]  Hong Lu,et al.  Automatic Processor Customization for Zero-Overhead Online Software Verification , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Wei Liu,et al.  AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9]  K. Rustan M. Leino,et al.  Houdini, an Annotation Assistant for ESC/Java , 2001, FME.

[10]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[11]  Luigi Carro,et al.  Designing and testing fault-tolerant techniques for SRAM-based FPGAs , 2004, CF '04.

[12]  Sarita V. Adve,et al.  Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[13]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[14]  George C. Necula,et al.  CCured in the real world , 2003, PLDI '03.

[15]  F. R. Palomo,et al.  A Novel Co-Design Approach for Soft Errors Mitigation in Embedded Systems , 2011, IEEE Transactions on Nuclear Science.

[16]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[17]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[18]  Michael J. Wirthlin,et al.  FPGA partial reconfiguration via configuration scrubbing , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[19]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.