Combined circuit and microarchitecture techniques for effective soft error robustness in SMT processors

As semiconductor technology scales, reliability is becoming an increasingly crucial challenge in microprocessor design. The rSRAM and voltage scaling are two promising circuit-level radiation hardening techniques to increase soft error robustness of a SRAM-based storage cell. However, applying circuit-level radiation hardening techniques to all on-chip transistors will result in significant overhead in performance and power consumption. In this paper, we propose microarchitecture support that allows cost-effective implementation of radiation hardened key microarchitecture structures (e.g. issue queue and reorder buffer) in SMT processors using soft error robust circuit techniques. Our study shows that the combined circuit and microarchitecture techniques achieve attractive tradeoffs between reliability, performance and power.

[1]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[2]  P. Eaton,et al.  Soft error rate mitigation techniques for modern microcircuits , 2002, 2002 IEEE International Reliability Physics Symposium. Proceedings. 40th Annual (Cat. No.02CH37320).

[3]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[4]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2006, IEEE Micro.

[5]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[6]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[7]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[8]  T. Calin,et al.  Upset hardened memory design for submicron CMOS technology , 1996 .

[9]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Xin Fu,et al.  An Analysis of Microarchitecture Vulnerability to Soft Errors on Simultaneous Multithreaded Architectures , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[11]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[12]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[13]  Joseph J. Sharkey,et al.  Efficient instruction schedulers for SMT processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[14]  Kaushik Roy,et al.  A soft error monitor using switching current detection , 2005, 2005 International Conference on Computer Design.

[15]  F. Jacquet,et al.  An alpha immune and ultra low neutron SER high density SRAM , 2004, 2004 IEEE International Reliability Physics Symposium. Proceedings.

[16]  Hai Li,et al.  VSV: L2-miss-driven variable supply-voltage scaling for low power , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[17]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[18]  Takashi Ishikawa,et al.  Design methodology of ultra low-power MPEG4 codec core exploiting voltage scaling techniques , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[19]  Anand Sivasubramaniam,et al.  SlicK: slice-based locality exploitation for efficient redundant multithreading , 2006, ASPLOS XII.

[20]  David Kaeli,et al.  Reliability in the Shadow of Long-Stall Instructions , 2007 .

[21]  Diana Marculescu On the Use of Microarchitecture-Driven Dynamic Voltage Scaling , 2000 .

[22]  Quming Zhou,et al.  Design optimization for single-event upset robustness using simultaneous dual-VDD and sizing techniques , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[23]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[24]  Abhijit Chatterjee,et al.  On transistor level gate sizing for increased robustness to transient faults , 2005, 11th IEEE International On-Line Testing Symposium.

[25]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[26]  Thomas D. Burd,et al.  Design issues for Dynamic Voltage Scaling , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[27]  Sanjay J. Patel,et al.  ReStore: Symptom-Based Soft Error Detection in Microprocessors , 2006, IEEE Trans. Dependable Secur. Comput..

[28]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[29]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[30]  Vladimir Stojanovic,et al.  A cost-effective implementation of an ECC-protected instruction queue for out-of-order microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[31]  Tao Li,et al.  ORBIT: Effective Issue Queue Soft-Error Vulnerability Mitigation on Simultaneous Multithreaded Architectures Using Operand Readiness-Based Instruction Dispatch , 2008, 2008 20th International Symposium on Computer Architecture and High Performance Computing.

[32]  Chris Wilkerson,et al.  Locality vs. criticality , 2001, ISCA 2001.

[33]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[34]  Babak Falsafi,et al.  Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[35]  C. Zheng,et al.  ; 0 ; , 1951 .