Built-In Soft Error Resilience for Robust System Design

Built-in soft error resilience (BISER) is an architecture-aware circuit design technique for correcting soft errors in latches, flip-flops and combinational logic. BISER enables more than an order of magnitude reduction in chip-level soft error rate with minimal area impact, 6-10% chip-level power impact, and 1-5% performance impact (depending on whether combinational logic error correction is implemented or not). In comparison, several traditional error-detection techniques introduce 40-100% power, performance and area penalties, and require significant efforts for designing and validating corresponding recovery mechanisms. In addition, BISER enables system design with configurable soft error protection features. Such features are extremely important for future designs targeting applications with a wide range of power, performance and reliability constraints. Design trade-offs associated with BISER and other existing soft error protection techniques are also analyzed.

[1]  Y. Yagil,et al.  A systematic approach to SER estimation and solutions , 2003, 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual..

[2]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[3]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[4]  Kartik Mohanram,et al.  Gate sizing to radiation harden combinational logic , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  K ReinhardtSteven,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002 .

[6]  Lisa Spainhower,et al.  IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM J. Res. Dev..

[7]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[8]  Ming Zhang,et al.  Logic soft errors: a major barrier to robust platform design , 2005, IEEE International Conference on Test, 2005..

[9]  Edward J. McCluskey,et al.  Dependable Computing and Online Testing in Adaptive and Configurable Systems , 2000, IEEE Des. Test Comput..

[10]  Naresh R. Shanbhag,et al.  Sequential Element Design With Built-In Soft Error Resilience , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[12]  Lisa Spainhower,et al.  Commercial fault tolerance: a tale of two systems , 2004, IEEE Transactions on Dependable and Secure Computing.

[13]  S. Vangal,et al.  Selective node engineering for chip-level soft error rate improvement [in CMOS] , 2002, 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302).

[14]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[15]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[16]  Subhasish Mitra,et al.  S: Error Detection by Diverse Data and Duplicated Instructions , 2002 .

[17]  Ming Zhang,et al.  Combinational Logic Soft Error Correction , 2006, 2006 IEEE International Test Conference.

[19]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[20]  S. Mitra,et al.  Erratic Bit Errors in Latches , 2007, 2007 IEEE International Reliability Physics Symposium Proceedings. 45th Annual.

[21]  N. Seifert,et al.  Timing vulnerability factors of sequentials , 2004, IEEE Transactions on Device and Materials Reliability.

[22]  T. Calin,et al.  Upset hardened memory design for submicron CMOS technology , 1996 .

[23]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.