Exploiting Existing Comparators for Fine-Grained Low-Cost Error Detection

Fault tolerance has become a fundamental concern in computer design, in addition to performance and power. Although several error detection schemes have been proposed to discover a faulty core in the system, these proposals could waste the whole core, including many error-free structures in it after error detection. Moreover, many fault-tolerant designs require additional hardware for data replication or for comparing the replicated data. In this study, we provide a low-cost, fine-grained error detection scheme by exploiting already existing comparators and data replications in the several pipeline stages such as issue queue, rename logic, and translation lookaside buffer. We reduce the vulnerability of the source register tags in IQ by 60%, the vulnerability of instruction TLB by 64%, the vulnerability of data TLB by 45%, and the vulnerability of the register tags of rename logic by 20%.

[1]  Dezsö Sima,et al.  The Design Space of Register Renaming Techniques , 2000, IEEE Micro.

[2]  Todd M. Austin DIVA: A Dynamic Approach to Microprocessor Verification , 2000, J. Instr. Level Parallelism.

[3]  E.S. Fetzer,et al.  The Parity protected, multithreaded register files on the 90-nm itanium microprocessor , 2006, IEEE Journal of Solid-State Circuits.

[4]  Mateo Valero,et al.  FIMSIM: A fault injection infrastructure for microarchitectural simulators , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[5]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[6]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  Mateo Valero,et al.  Multiple-banked register file architectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  Sule Ozev,et al.  Tolerating hard faults in microprocessor array structures , 2004, International Conference on Dependable Systems and Networks, 2004.

[9]  Todd M. Austin,et al.  Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[10]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[11]  Kanad Ghose,et al.  Early Register Deallocation Mechanisms Using Checkpointed Register Files , 2006, IEEE Transactions on Computers.

[12]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[13]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[14]  Pradip Bose,et al.  Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[15]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[16]  Antonio Rubio,et al.  An approach to crosstalk effect analysis and avoidance techniques in digital CMOS VLSI circuits , 1988 .

[17]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[18]  Sanjay J. Patel,et al.  ReStore: symptom based soft error detection in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[19]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[20]  Kanad Ghose,et al.  Instruction packing: Toward fast and energy-efficient instruction scheduling , 2006, TACO.

[21]  Arijit Biswas,et al.  Computing Accurate AVFs using ACE Analysis on Performance Models: A Rebuttal , 2008, IEEE Computer Architecture Letters.

[22]  Sule Ozev,et al.  A mechanism for online diagnosis of hard faults in microprocessors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[23]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[24]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[25]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[26]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[27]  天野 英晴 J. L. Hennessy and D. A. Patterson: Computer Architecture: A Quantitative Approach, Morgan Kaufmann (1990)(20世紀の名著名論) , 2003 .

[28]  Albert Meixner,et al.  Error Detection Using Dynamic Dataflow Verification , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[29]  Sarita V. Adve,et al.  Understanding the propagation of hard errors to software and implications for resilient system design , 2008, ASPLOS.

[30]  Amin Ansari,et al.  StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems , 2008, CASES '08.

[31]  J. Baylis Error-correcting Codes , 2014 .

[32]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[33]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[34]  Gürhan Küçük,et al.  Energy efficient comparators for superscalar datapaths , 2004, IEEE Transactions on Computers.

[35]  Scott A. Mahlke,et al.  BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[36]  Trevor N. Mudge,et al.  Virtual memory in contemporary microprocessors , 1998, IEEE Micro.

[37]  Daniel J. Sorin,et al.  Core Cannibalization Architecture: Improving lifetime chip performance for multicore processors in the presence of hard faults , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[38]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[39]  Andreas Moshovos Power-Aware Register Renaming , 2022 .

[40]  Gürhan Küçük,et al.  Energy-efficient issue queue design , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[41]  Mikko H. Lipasti,et al.  Half-price architecture , 2003, ISCA '03.

[42]  B. Ramakrishna Rau,et al.  Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[43]  Oguz Ergin,et al.  Using tag-match comparators for detecting soft errors , 2007, IEEE Computer Architecture Letters.

[44]  Doug Burger,et al.  Exploiting microarchitectural redundancy for defect tolerance , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[45]  Steve Keckler,et al.  Proceedings of the 36th annual international symposium on Computer architecture , 2009, ISCA 2009.