Soft error vulnerability aware process variation mitigation

As transistor process technology approaches the nanometer scale, process variation significantly affects the design and optimization of high performance microprocessors. Prior studies have shown that chip operating frequency and leakage power can have large variations due to fluctuations in transistor gate length and sub-threshold voltage. In this work, we study the impact of process variation on microarchitecture soft error robustness, an increasing reliability design challenge in the billion-transistor chip era. We explore two techniques that can effectively mitigate the effect of design parameter variation while significantly enhancing microarchitecture soft error reliability. Our first technique is entry-based. It tolerates the deleterious impact of variable latency techniques on soft error reliability by reducing the quantity and residency cycle of vulnerable bits in the microarchitecture structure at a fine granularity. Our second technique is structure-based. It applies body biasing schemes to dynamically adapt transistor sub-threshold voltage (and hence device-level soft error robustness) to the program reliability characteristics at a coarse granularity. We also combine the two techniques which further produces improved results. Compared to existing process variation tolerant schemes, our proposed techniques achieve optimal trade-offs between reliability, performance, and power. To our knowledge, this paper presents the first study on characterizing and optimizing processor microarchitecture resilience to soft errors in light of process variation.

[1]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[2]  Rajendran Panda,et al.  Statistical timing analysis using bounds and selective enumeration , 2002, TAU '02.

[3]  Josep Torrellas,et al.  Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[4]  Gu-Yeon Wei,et al.  ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency , 2008, 2008 International Symposium on Computer Architecture.

[5]  Josep Torrellas,et al.  ReCycle:: pipeline adaptation to tolerate process variation , 2007, ISCA '07.

[6]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[7]  Narayanan Vijaykrishnan,et al.  The effect of threshold voltages on the soft error rate [memory and logic circuits] , 2004, International Symposium on Signals, Circuits and Systems. Proceedings, SCS 2003. (Cat. No.03EX720).

[8]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[9]  D. Sylvester,et al.  Soft Error Reduction in Combinational Logic Using Gate Resizing and Flipflop Selection , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[10]  David M. Brooks,et al.  Mitigating the Impact of Process Variations on Processor Register Files and Execution Units , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  K. Soumyanath,et al.  Impact of body bias on alpha- and neutron-induced soft error rates of flip-flops , 2004, 2004 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.04CH37525).

[12]  James Tschanz,et al.  Parameter variations and impact on circuits and microarchitecture , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[13]  Gu-Yeon Wei,et al.  Process Variation Tolerant 3T1D-Based Cache Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[14]  David Blaauw,et al.  Statistical timing analysis for intra-die process variations with spatial correlations , 2003, ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486).

[15]  Kevin Skadron,et al.  HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects , 2003 .

[16]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[17]  Anand Sivasubramaniam,et al.  SlicK: slice-based locality exploitation for efficient redundant multithreading , 2006, ASPLOS XII.

[18]  Josep Torrellas,et al.  Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors , 2008, 2008 International Symposium on Computer Architecture.

[19]  Kurt Keutzer,et al.  Impact of spatial intrachip gate length variability on theperformance of high-speed digital circuits , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[20]  Narayanan Vijaykrishnan,et al.  Analysis and solutions to issue queue process variation , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[21]  James D. Meindl,et al.  Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration , 2002, IEEE J. Solid State Circuits.

[22]  David Blaauw,et al.  Modeling and analysis of leakage power considering within-die process variations , 2002, ISLPED '02.

[23]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[24]  Vivek De,et al.  Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[25]  Chris Wilkerson,et al.  Locality vs. criticality , 2001, ISCA 2001.

[26]  David Blaauw,et al.  Statistical optimization of leakage power considering process variations using dual-Vth and sizing , 2004, Proceedings. 41st Design Automation Conference, 2004..

[27]  Andrew B. Kahng How much variability can designers tolerate? , 2003, IEEE Design & Test of Computers.

[28]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[29]  Sachin S. Sapatnekar,et al.  Full-chip analysis of leakage power under process variations, including spatial correlations , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[30]  Ke Meng,et al.  Process Variation Aware Cache Leakage Management , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[31]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[32]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[33]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[34]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[35]  D. Sylvester,et al.  Statistical estimation of leakage current considering inter- and intra-die process variation , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[36]  Rong Luo,et al.  Modeling the Impact of Process Variation on Critical Charge Distribution , 2006, 2006 IEEE International SOC Conference.

[37]  Josep Torrellas,et al.  Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[38]  James Tschanz,et al.  Impact of Parameter Variations on Circuits and Microarchitecture , 2006, IEEE Micro.