Reliability-aware simultaneous multithreaded architecture using online architectural vulnerability factor estimation

Miniaturisation in modern microprocessors increases susceptibility to soft errors leading to reliability degradation. Recently simultaneous multithreaded (SMT) architecture is utilised to improve fault tolerance. Despite full coverage, redundant checking in such schemes causes significant performance and energy overheads. Fortunately, some of the soft errors can be masked at the architectural level and architectural vulnerability factor (AVF) of a structure represents the portion of soft errors which lead to a failure in the output of a program. In this study, the authors present an infrastructure for online monitoring of AVF of sensitive structures of an SMT processor. Based on estimated AVF, we have introduced partial thread redundancy (PTR) protection scheme for intervals whose AVF is greater than a predefined threshold and the estimated AVF is used for adaptation between reliability improvement or performance enhancement, especially when the processor executes more than one workload. We have utilised SPEC CPU2006 benchmarks for AVF estimation of some important hardware resources such as issue queue, reorder buffer, load/store queue and register file. Experimental results show that the mean absolute error of our AVF estimation method varies from 0.04 to 0.07 and combined online AVF estimation and PTR protection, leads to a reliability aware execution and lower performance overhead.

[1]  Bin Li,et al.  Predicting Architectural Vulnerability on Multithreaded Processors under Resource Contention and Sharing , 2013, IEEE Transactions on Dependable and Secure Computing.

[2]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[3]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[4]  Y. Yagil,et al.  A systematic approach to SER estimation and solutions , 2003, 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual..

[5]  Sudhanva Gurumurthi,et al.  Dynamic prediction of architectural vulnerability from microarchitectural state , 2007, ISCA '07.

[6]  Shubhendu S. Mukherjee,et al.  APast Future Time Quantized AVF : A Means of Capturing Vulnerability Variations over Small Windows of Time , 2009 .

[7]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[8]  Stijn Eyerman,et al.  A first-order mechanistic model for architectural vulnerability factor , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[9]  Yu Hu,et al.  Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.

[10]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2006, IEEE Micro.

[11]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  J. Fortes,et al.  Sim-SODA : A Unified Framework for Architectural Level Software Reliability Analysis , 2006 .

[13]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[14]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[15]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[16]  Tao Li,et al.  Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[17]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[18]  Farshad Firouzi,et al.  Adaptive fault-tolerant DVFS with dynamic online AVF prediction , 2012, Microelectron. Reliab..

[19]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[20]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .