Eliminating Timing Errors Through Collaborative Design to Maximize the Throughput

In advanced technology nodes, large timing margins must be added to allow for worse process, voltage, temperature, and aging variations. The error detection and correction (EDAC) technique effectively eliminates these margins by timing speculation, but the high design complexity and large hardware cost make many existing EDAC systems unsuitable for commercial processors. Based on the instruction-level locality of timing errors, a collaborative EDAC approach is proposed to address this issue. The hardware layer adopts simple and low cost EDAC circuits to ensure correct operation when timing error occurs, while a runtime software layer prevents recurring errors of the same instruction by sending timing error alarms to the hardware layer. Cooperation of both layers, accompanied with the proposed profile-guided timing error avoidance algorithm, eliminates more than 95% of errors with small runtime overhead. This significantly improves overall performance and alleviates pressure on the EDAC circuits. Experimental results based on the three-stage commercial CK802 processor in SMIC 40LL process present that the approach has improved the peak performance of the baseline EDAC system (Razor-Lite + half-frequency replay) by 8% and reduced the energy consumption by 25%, with less than 1.4% area overhead.

[1]  Robert C. Aitken,et al.  Time-Borrowing Circuit Designs and Hardware Prototyping for Timing Error Resilience , 2014, IEEE Transactions on Computers.

[2]  David Blaauw,et al.  Bubble Razor: Eliminating Timing Margins in an ARM Cortex-M3 Processor in 45 nm CMOS Using Architecturally Independent Error Detection and Correction , 2013, IEEE Journal of Solid-State Circuits.

[3]  Keith A. Bowman,et al.  Impact of Die-to-Die and Within-Die Parameter Variations on the Clock Frequency and Throughput of Multi-Core Processors , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  David M. Brooks,et al.  Resilient Architectures via Collaborative Design: Maximizing Commodity Processor Performance in the Presence of Variations , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  Luca Benini,et al.  Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters , 2014, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[6]  Quinn Jacobson,et al.  ERSA: error resilient system architecture for probabilistic applications , 2010, DATE 2010.

[7]  Diego Novillo,et al.  SamplePGO - The Power of Profile Guided Optimizations without the Usability Burden , 2014, 2014 LLVM Compiler Infrastructure in HPC.

[8]  David M. Bull,et al.  RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[9]  Sanghamitra Roy,et al.  Efficiently tolerating timing violations in pipelined microprocessors , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Saurabh Dighe,et al.  Within-Die Variation-Aware Dynamic-Voltage-Frequency-Scaling With Optimal Core Allocation and Thread Hopping for the 80-Core TeraFLOPS Processor , 2011, IEEE Journal of Solid-State Circuits.

[11]  Luca Benini,et al.  Procedure hopping: a low overhead solution to mitigate variability in shared-L1 processor clusters , 2012, ISLPED '12.

[12]  Shohaib Aboobacker RAZOR: circuit-level correction of timing errors for low-power operation , 2011 .

[13]  Paolo A. Aseron,et al.  A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance , 2011, IEEE Journal of Solid-State Circuits.

[14]  Mateo Valero,et al.  Profile-guided transaction coalescing—lowering transactional overheads by merging transactions , 2013, TACO.

[15]  Kaushik Roy,et al.  Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  Jae-Joon Kim,et al.  One-Cycle Correction of Timing Errors in Pipelines With Standard Clocked Elements , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  Michail Maniatakos,et al.  Instruction-Level Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller , 2011, IEEE Transactions on Computers.

[18]  Jing Xin,et al.  Identifying and predicting timing-critical instructions to boost timing speculation , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[19]  Luca Benini,et al.  Application-Adaptive Guardbanding to Mitigate Static and Dynamic Variability , 2014, IEEE Transactions on Computers.

[20]  Dennis Sylvester,et al.  Razor-Lite: A Light-Weight Register for Error Detection by Observing Virtual Supply Rails , 2014, IEEE Journal of Solid-State Circuits.

[21]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[22]  Paul H. Siegel,et al.  Characterizing flash memory: Anomalies, observations, and applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  E. Nowak,et al.  High-performance CMOS variability in the 65-nm regime and beyond. IBM J Res And Dev , 2006 .

[24]  Meeta Sharma Gupta,et al.  Eliminating voltage emergencies via software-guided code transformations , 2010, TACO.

[25]  Hyungmin Cho,et al.  Cross-layer error resilience for robust systems , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[26]  Lara Dolecek,et al.  Underdesigned and Opportunistic Computing in Presence of Hardware Variability , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27]  Anand Raghunathan,et al.  Relax-and-Retime: A methodology for energy-efficient recovery based design , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  R. Kent Dybvig,et al.  Profile-guided meta-programming , 2015, PLDI.

[29]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[30]  Deming Chen,et al.  DynaTune: Circuit-level optimization for timing speculation considering dynamic path behavior , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[31]  Jie Zhang,et al.  On the premises and prospects of timing speculation , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[32]  K.A. Bowman,et al.  Energy-Efficient and Metastability-Immune Resilient Circuits for Dynamic Variation Tolerance , 2009, IEEE Journal of Solid-State Circuits.

[33]  Kaushik Roy,et al.  Containing the Nanometer “Pandora-Box”: Cross-Layer Design Techniques for Variation Aware Low Power Systems , 2011, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[34]  Robert Bruce Findler,et al.  Exploring circuit timing-aware language and compilation , 2011, ASPLOS XVI.

[35]  Tzi-Dar Chiueh,et al.  An energy-efficient resilient flip-flop circuit with built-in timing-error detection and correction , 2015, VLSI Design, Automation and Test(VLSI-DAT).

[36]  John Sartori,et al.  Compiling for energy efficiency on timing speculative processors , 2012, DAC Design Automation Conference 2012.

[37]  Jing Xin,et al.  Exploiting locality to improve circuit-level timing speculation , 2009, IEEE Computer Architecture Letters.

[38]  Mingoo Seok,et al.  Variation-Tolerant, Ultra-Low-Voltage Microprocessor With a Low-Overhead, Within-a-Cycle In-Situ Timing-Error Detection and Correction Technique , 2015, IEEE Journal of Solid-State Circuits.

[39]  David Blaauw,et al.  A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation , 2011, IEEE Journal of Solid-State Circuits.

[40]  Izzat Darwazeh,et al.  Circuit-Level Timing Error Tolerance for Low-Power DSP Filters and Transforms , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[41]  Sanghamitra Roy,et al.  Predicting timing violations through instruction-level path sensitization analysis , 2012, DAC Design Automation Conference 2012.

[42]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..