Transparent dynamic binding with fault-tolerant cache coherence protocol for chip multiprocessors

Aggressive technology scaling causes chip multiprocessors increasingly error-prone. Core-level fault-tolerant approaches bind two cores to implement redundant execution and error detection. However, along with more cores integrated into one chip, existing static and dynamic binding schemes suffer from the scalability problem when considering the violation effects caused by external write operations. In this paper, we present a transparent dynamic binding (TDB) mechanism to address the issue. Learning from static binding schemes, we involve the private caches to hold identical data blocks, thus we reduce the global masters-lave consistency maintenance to the scale of the private caches. With our fault-tolerant cache coherence protocol, TDB satisfies the objective of private cache consistency, therefore provides excellent scalability and flexibility. Experimental results show that, for a set of parallel workloads, the overall performance of our TDB scheme is very close to that of baseline fault-tolerant systems, outperforming dynamic core coupling by 9.2%, 10.4%, 18% and 37.1% when considering 4, 8, 16 and 32 cores respectively.

[1]  John B. Carter,et al.  An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[2]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[3]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[5]  Yu Hu,et al.  IVF: characterizing the vulnerability of microprocessor structures to intermittent faults , 2010, DATE 2010.

[6]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[7]  Michael C. Huang,et al.  Improving support for locality and fine-grain sharing in chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  David I. August,et al.  Design and evaluation of hybrid fault-detection systems , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[9]  Onur Mutlu,et al.  Performance-aware speculation control using wrong path usefulness prediction , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[10]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[11]  Juan L. Aragón,et al.  Evaluating Dynamic Core Coupling in a Scalable Tiled-CMP Architecture , 2008 .

[12]  Babak Falsafi,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[13]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  Babak Falsafi,et al.  Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[15]  Kewal K. Saluja,et al.  Energy-efficient fault tolerance in chip multiprocessors using Critical Value Forwarding , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[16]  Irith Pomeranz,et al.  Transient-Fault Recovery for Chip Multiprocessors , 2003, IEEE Micro.

[17]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS XV.

[18]  Engin Ipek,et al.  Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[19]  Babak Falsafi,et al.  Reunion: Complexity-Effective Multicore Redundancy , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[21]  Onur Mutlu,et al.  Microarchitecture-based introspection: a technique for transient-fault tolerance in microprocessors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[22]  Babak Falsafi,et al.  Fingerprinting: bounding soft-error-detection latency and bandwidth , 2004, IEEE Micro.

[23]  Keith A. Bowman,et al.  Circuit techniques for dynamic variation tolerance , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[24]  Qiang Xu,et al.  Performance yield-driven task allocation and scheduling for MPSoCs under process variation , 2010, Design Automation Conference.

[25]  J. Martin-Martinez,et al.  Gate Oxide Wear-Out and Breakdown Effects on the Performance of Analog and Digital Circuits , 2008, IEEE Transactions on Electron Devices.

[26]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[27]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[28]  Pradeep Dubey,et al.  Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .

[29]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[30]  Onur Mutlu,et al.  Understanding the effects of wrong-path memory references on processor performance , 2004, WMPI '04.

[31]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[32]  Augustus K. Uht,et al.  The impact of wrong-path memory references in cache-coherent multiprocessor systems , 2007, J. Parallel Distributed Comput..

[33]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[34]  Niraj K. Jha,et al.  GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[35]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[36]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.