Analysis of single-event effects in embedded processors for non-uniform fault tolerant design

Advances in silicon technology and shrinking the feature size to nanometer scale make unreliability of nano devices the most important concern of fault-tolerant designs. Design of reliable and fault-tolerant embedded processors is mostly based on developing techniques that compensate adding hardware or software redundancy. The recently-proposed redundancy techniques are generally applied uniformly to a system and lead to inefficiencies in terms of performance, power, and area. Non-uniform redundancy requires a quantitative analysis of the system behavior encountering transient faults. In this paper, we introduce a custom fault injection framework that helps to locate the most vulnerable nodes and components of embedded processors. Our framework is based on an exhaustive transient fault injection to candidate nodes which are selected from a user-defined list. Furthermore, the list of nodes containing the microarchitectural state is also defined by user to validate execution of instructions. Based on the reported results, the most vulnerable nodes, components, and instructions are found and could be used for an effective non-uniform fault-tolerant redundancy technique.

[1]  Ravishankar K. Iyer,et al.  An experimental study of soft errors in microprocessors , 2005, IEEE Micro.

[2]  Diana Marculescu,et al.  MARS-C: modeling and reduction of soft errors in combinational circuits , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[3]  Y. Yagil,et al.  A systematic approach to SER estimation and solutions , 2003, 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual..

[4]  R. Baumann The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction , 2002, Digest. International Electron Devices Meeting,.

[5]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[6]  Y. C. Yeh,et al.  Triple-triple redundant 777 primary flight computer , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[7]  Sied Mehdi Fakhraie,et al.  Design of a Custom Packet Switching Engine for Network Applications , 2008 .

[8]  Abhijit Chatterjee,et al.  Soft-error tolerance analysis and optimization of nanometer circuits , 2005, Design, Automation and Test in Europe.

[9]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[10]  Krisztián Flautner,et al.  A Microarchitectural Analysis of Soft Error Propagation in a Production-Level Embedded Microprocessor , 2005 .

[11]  Narayanan Vijaykrishnan,et al.  SEAT-LA: a soft error analysis tool for combinational logic , 2006, 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID'06).

[12]  Seyed Ghassem Miremadi,et al.  Dependability analysis using a fault injection tool based on synthesizability of HDL models , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[13]  Robert W. Horst,et al.  Multiple instruction issue in the NonStop cyclone processor , 1990, ISCA '90.

[14]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[15]  Xiaodong Li,et al.  Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[16]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.