A new hybrid fault detection technique for systems-on-a-chip

Hardening SoCs against transient faults requires new techniques able to combine high fault detection capabilities with the usual requirements of SoC design flow, e.g., reduced design-time, low area overhead, and reduced (or null) accessibility to source core descriptions. This paper proposes a new hybrid approach which combines hardening software transformations with the introduction of an Infrastructure IP with reduced memory and performance overheads. The proposed approach targets faults affecting the memory elements storing both the code and the data, independently of their location (inside or outside the processor). Extensive experimental results, including comparisons with previous approaches, are reported, which allow practically evaluating the characteristics of the method in terms of fault detection capabilities and area, memory, and performance overheads.

[1]  M. Namjoo,et al.  WATCHDOG PROCESSORS AND CAPABILITY CHECKING , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[2]  R. Velazco,et al.  Experimentally evaluating an automatic approach for generating safety-critical software with respect to transient errors , 2000 .

[3]  M. Rimen,et al.  Implicit signature checking , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[4]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[5]  Fabian Vargas,et al.  Hybrid soft error detection by means of infrastructure IP cores [SoC implementation] , 2004 .

[6]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[7]  Brian Randell System structure for software fault tolerance , 1975 .

[8]  R. Koga,et al.  Single Event Upset Rate Estimates for a 16-K CMOS SRAM , 1985, IEEE Transactions on Nuclear Science.

[9]  S. Rezgui,et al.  Predicting error rate for microprocessor-based digital architectures through C.E.U. (Code Emulating Upsets) injection , 2000 .

[10]  V. K. Agarwal,et al.  Continuous Signature Monitoring: Low-Cost Concurrent Detection of Processor Control Errors , 1990 .

[11]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[12]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[13]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.

[14]  Michael Nicolaidis,et al.  SEU-tolerant SRAM design based on current monitoring , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[15]  Massimo Violante,et al.  An FPGA-Based Approach for Speeding-Up Fault Injection Campaigns on Safety-Critical Circuits , 2002, J. Electron. Test..

[16]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[17]  Edward J. McCluskey,et al.  Concurrent Fault Detection Using a Watchdog Processor and Assertions , 1983, ITC.

[18]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[19]  Algirdas Avizienis,et al.  The N-Version Approach to Fault-Tolerant Software , 1985, IEEE Transactions on Software Engineering.

[20]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[21]  Michael Nicolaidis,et al.  Embedded robustness IPs for transient-error-free ICs , 2002, IEEE Design & Test of Computers.

[22]  Yervant Zorian Guest Editor's Introduction: What is Infrastructure IP? , 2002, IEEE Des. Test Comput..

[23]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[24]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[25]  Ahmad A. Al-Yamani,et al.  Performance evaluation of checksum-based ABFT , 2001, Proceedings 2001 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[26]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.