Core Reliability: Leveraging Hardware Transactional Memory

Modern microprocessors are more vulnerable to transient faults or soft errors than ever before due to design trends mandating low supply voltage and reduced noise margins, shrinking feature sizes and increased transistor density for fast, low-power circuits. As industry now supports Hardware Transactional Memory (HTM), the features of HTM can be leveraged to add core resiliency to transient errors. In this paper, we propose a novel microarchitecture for transient error detection and recovery based on time redundancy and backward error recovery leveraging HTM's existing features especially its rollback mechanism. We provide implementation details for single-core reliability, minimizing additions to existing HTM supports. We evaluate the performance overheads of the single core with the reliability feature by comparing it to the base machine without the reliability feature. Finally we show how single-core reliability can be extended to multi-core reliability.

[1]  Mateo Valero,et al.  EazyHTM: EAger-LaZY hardware Transactional Memory , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Josep Torrellas,et al.  BulkSC: bulk enforcement of sequential consistency , 2007, ISCA '07.

[3]  David A. Wood,et al.  LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[4]  Maurice Herlihy,et al.  Virtualizing transactional memory , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  David A. Wood,et al.  LogTM: log-based transactional memory , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[6]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[8]  Josep Torrellas,et al.  Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[9]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[10]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[11]  Michael L. Scott,et al.  Flexible Decoupled Transactional Memory Support , 2008, 2008 International Symposium on Computer Architecture.

[12]  Kunle Olukotun,et al.  Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[13]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[14]  Marc Lupon,et al.  A Dynamically Adaptable Hardware Transactional Memory , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Kunle Olukotun,et al.  A Scalable, Non-blocking Approach to Transactional Memory , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[16]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[17]  Milo M. K. Martin,et al.  Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.

[18]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[19]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[20]  Josep Torrellas,et al.  ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors , 2002, ISCA.

[21]  Jack K. Wolf,et al.  On the Probability of Undetected Error for Linear Block Codes , 1982, IEEE Trans. Commun..

[22]  Bradley C. Kuszmaul,et al.  Unbounded Transactional Memory , 2005, HPCA.

[23]  Josep Torrellas,et al.  ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[24]  Maged M. Michael,et al.  Evaluation of Blue Gene/Q hardware support for transactional memories , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25]  Per Stenström,et al.  ZEBRA: a data-centric, hybrid-policy hardware transactional memory design , 2011, ICS '11.

[26]  Irith Pomeranz,et al.  Transient-Fault Recovery for Chip Multiprocessors , 2003, IEEE Micro.