SVFD: A Versatile Online Fault Detection Scheme via Checking of Stability Violation

In ultra-deep submicrometer technology, soft errors and device aging are two of the paramount reliability concerns. Although many studies have been done to tackle the two challenges, most take them separately so far, thereby failing to reach better performance-cost tradeoffs. To support a more efficient design tradeoff, we propose a unified fault detection scheme - stability violation-based fault detection (SVFD), by which the soft errors (both single event upset and single event transient), aging delay, and delay faults can be uniformly dealt with. SVFD grounds on a new fault model, stability violation, derived from analysis of signal behavior. SVFD has been validated by conducting a set of intensive Hspice simulations targeting the next-generation 32-nm CMOS technology. An application of SVFD to a floating-point unit (FPU) is also evaluated. Experimental results show that SVFD has more versatile fault detection capability for fault detection than several schemes recently proposed at comparable overhead in terms of area, power, and performance.

[1]  T. Xanthopoulos,et al.  The design and analysis of the clock distribution network for a 1.2 GHz Alpha microprocessor , 2001, 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177).

[2]  Naresh R. Shanbhag,et al.  Sequential Element Design With Built-In Soft Error Resilience , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  S. Naffziger,et al.  Clock distribution on a dual-core, multi-threaded Itanium/sup /spl reg//-family processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[4]  Robert K. Brayton,et al.  Minimum padding to satisfy short path constraints , 1993, ICCAD '93.

[5]  Sunil P. Khatri,et al.  A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements , 2008, 2008 Design, Automation and Test in Europe.

[6]  I. Sutherland,et al.  Logical Effort: Designing Fast CMOS Circuits , 1999 .

[7]  Hector Sanchez,et al.  A 2.2 W, 80 MHz superscalar RISC microprocessor , 1994 .

[8]  David Blaauw,et al.  Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[9]  Yu Cao,et al.  New generation of predictive technology model for sub-45nm design exploration , 2006, 7th International Symposium on Quality Electronic Design (ISQED'06).

[10]  Yu Cao,et al.  Modeling and minimization of PMOS NBTI effect for robust nanometer design , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[11]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[12]  Michael Nicolaidis GRAAL: a new fault tolerant design paradigm for mitigating the flaws of deep nanometric technologies , 2007, 2007 IEEE International Test Conference.

[13]  Sanjay Pant,et al.  A self-tuning DVS processor using delay-error detection and correction , 2005, IEEE Journal of Solid-State Circuits.

[14]  Gu-Yeon Wei,et al.  ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency , 2008, 2008 International Symposium on Computer Architecture.

[15]  K. Soumyanath,et al.  Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 /spl mu/ , 2001, 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185).

[16]  Edward J. McCluskey,et al.  Error detection by selective procedure call duplication for low energy consumption , 2002, IEEE Trans. Reliab..

[17]  Michael Gschwind,et al.  Integrated analysis of power and performance for pipelined microprocessors , 2004, IEEE Transactions on Computers.

[18]  Jianbo Gao,et al.  Toward hardware-redundant, fault-tolerant logic for nanoelectronics , 2005, IEEE Design & Test of Computers.

[19]  Cecilia Metra,et al.  Sensing circuit for on-line detection of delay faults , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[20]  Bo Yang,et al.  Optimized Circuit Failure Prediction for Aging: Practicality and Promise , 2008, 2008 IEEE International Test Conference.

[21]  Young-Jin Jeon,et al.  A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs , 2004, IEEE Journal of Solid-State Circuits.

[22]  J. Black,et al.  Electromigration—A brief survey and some recent results , 1969 .

[23]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[24]  Mona Attariyan,et al.  Low-cost protection for SER upsets and silicon defects , 2007 .

[25]  Todd M. Austin,et al.  Exploiting selective placement for low-cost memory protection , 2008, TACO.

[26]  Baris Taskin,et al.  Delay insertion method in clock skew scheduling , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  M. Nicolaidis,et al.  Design for soft error mitigation , 2005, IEEE Transactions on Device and Materials Reliability.

[28]  Michael Nicolaidis Time redundancy based soft-error tolerance to rescue nanometer technologies , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[29]  Josep Torrellas,et al.  Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[30]  T. Sato,et al.  A 1 GHz portable digital delay-locked loop with infinite phase capture ranges , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[31]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[32]  B.C. Paul,et al.  Impact of NBTI on the temporal performance degradation of digital circuits , 2005, IEEE Electron Device Letters.

[33]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[34]  Yu Cao,et al.  The Impact of NBTI on the Performance of Combinational and Sequential Circuits , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[35]  Wei Liu,et al.  Using Register Lifetime Predictions to Protect Register Files Against Soft Errors , 2008 .

[36]  Yu Cao,et al.  New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration , 2006, IEEE Transactions on Electron Devices.

[37]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[38]  Kenneth C. Yeager,et al.  200-MHz superscalar RISC microprocessor , 1996, IEEE J. Solid State Circuits.

[39]  Josep Torrellas,et al.  Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[40]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).