Bit-slice logic interleaving for spatial multi-bit soft-error tolerance

Semiconductor devices are becoming more susceptible to single event upsets (SEUs) as device dimensions, operating voltages and frequencies are scaled. The majority of architecture-, logic- and circuit-level techniques that have been developed to address SEUs in logic assume a single-point fault model. This will soon be insufficient as the occurrence of spatial multi-bit errors is becoming prevalent in highly scaled devices. In this paper, we explore this new fault model and evaluate the effectiveness of conventional fault tolerance techniques to mitigate such faults. We also extend the idea of bit interleaving in memory to logic bit slices and explore its utility as an approach to spatial multi-bit error mitigation in logic. We present a comparison of these techniques using a case study of a Brent-Kung adder at a 90-nm process.

[1]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[2]  James F. Ziegler,et al.  Terrestrial cosmic rays , 1996, IBM J. Res. Dev..

[3]  Michael Nicolaidis,et al.  Fault-Secure Parity Prediction Arithmetic Operators , 1997, IEEE Des. Test Comput..

[4]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[5]  Shubu Mukherjee,et al.  Architecture Design for Soft Errors , 2008 .

[6]  J. Ziegler,et al.  Effect of Cosmic Rays on Computer Memories , 1979, Science.

[7]  Lloyd W. Massengill,et al.  Basic mechanisms and modeling of single-event upset in digital microelectronics , 2003 .

[8]  B. Narasimham,et al.  Radiation-Induced Soft Error Rates of Advanced CMOS Bulk Devices , 2006, 2006 IEEE International Reliability Physics Symposium Proceedings.

[9]  Charles F. Webb IBM z10: The Next-Generation Mainframe Microprocessor , 2008, IEEE Micro.

[10]  J.D. Cressler,et al.  Multiple-Bit Upset in 130 nm CMOS Technology , 2006, IEEE Transactions on Nuclear Science.

[11]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003 .

[12]  Michael Nicolaidis,et al.  Carry checking/parity prediction adders and ALUs , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[13]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[14]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  Michael Nicolaidis,et al.  Design of fault-secure parity-prediction Booth multipliers , 1998, Proceedings Design, Automation and Test in Europe.

[16]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[17]  Ryuji Kan,et al.  Validation of hardware error recovery mechanisms for the SPARC64 V microprocessor , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[18]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[19]  T. Moon Error Correction Coding: Mathematical Methods and Algorithms , 2005 .

[20]  H. Ando,et al.  A 1.3GHz fifth generation SPARC64 microprocessor , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[21]  David J. Sherwin,et al.  System Reliability Theory—Models and Statistical Methods , 1995 .

[22]  Johan Karlsson,et al.  On latching probability of particle induced transients in combinational networks , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[23]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[24]  Michael Nicolaidis,et al.  Fault-Secure Parity Prediction Booth Multipliers , 1999, IEEE Des. Test Comput..

[25]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[26]  Michael Nicolaidis,et al.  Efficient implementations of self-checking adders and ALUs , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[27]  Ram Huggahalli,et al.  Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[28]  Hiroyuki Sugiyama,et al.  A 1.3 GHz fifth generation SPARC64 microprocessor , 2003, 2003 IEEE International Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC..

[29]  K ReinhardtSteven,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002 .

[30]  B.L. Bhuva,et al.  Charge Collection and Charge Sharing in a 130 nm CMOS Technology , 2006, IEEE Transactions on Nuclear Science.