Efficient in situ error detection enabling diverse path coverage

Technology scaling continues to improve density, but also reduces the critical charge to hold a logic state, causing devices to become more susceptible to accidental disruptions due to noise and soft errors. Increased process variation adds to the reliability challenge, resulting in over designs and extra timing margins at the cost of power consumption, silicon area and performance degradation. We present efficient in situ error detection techniques to exploit datapath characteristics for monitoring circuit errors: pre-edge checking in non-critical paths without hold time constraints; post-edge checking in critical paths without sacrificing performance; and cross-edge checking in moderate paths for the optimal trade-off. The techniques are all realized using the inherent redundancy within a conventional flip-flop design and do not require any logic or sample duplication as done by most existing methods. The detection-enabled flip-flop is implemented using only 31 transistors as a competitive and low-cost solution.

[1]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[2]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[3]  Yiorgos Tsiatouhas,et al.  Cost and power efficient timing error tolerance in flip-flop based microprocessor cores , 2012, 2012 17th IEEE European Test Symposium (ETS).

[4]  Hiroaki Suzuki,et al.  Phase-adjustable Error Detection Flip-Flops with 2-stage hold driven optimization and slack based grouping scheme for Dynamic Voltage Scaling , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[5]  Hector Sanchez,et al.  A 2.2 W, 80 MHz superscalar RISC microprocessor , 1994 .

[6]  Naresh R. Shanbhag,et al.  Sequential Element Design With Built-In Soft Error Resilience , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[7]  Robert C. Aitken,et al.  TIMBER: Time borrowing and error relaying for online timing error resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[8]  Ming Zhang,et al.  Circuit Failure Prediction and Its Application to Transistor Aging , 2007, 25th IEEE VLSI Test Symposium (VTS'07).

[9]  Shekhar Y. Borkar,et al.  Design perspectives on 22nm CMOS and beyond , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[10]  Takeshi Kataoka,et al.  A Cost-Effective Dependable Microcontroller Architecture with Instruction-Level Rollback for Soft Error Recovery , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[11]  Nicholas P. Carter,et al.  Design techniques for cross-layer resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[12]  David Blaauw,et al.  A confidence-driven model for error-resilient computing , 2011, 2011 Design, Automation & Test in Europe.

[13]  Paolo A. Aseron,et al.  A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance , 2011, IEEE Journal of Solid-State Circuits.

[14]  K.A. Bowman,et al.  Energy-efficient and metastability-immune timing-error detection and recovery circuits for dynamic variation tolerance , 2008, 2008 IEEE International Conference on Integrated Circuit Design and Technology and Tutorial.

[15]  P. Dodd,et al.  Production and propagation of single-event transients in high-speed digital logic ICs , 2004, IEEE Transactions on Nuclear Science.