论文信息 - Necromancer: enhancing system throughput by animating dead cores

Necromancer: enhancing system throughput by animating dead cores

Aggressive technology scaling into the nanometer regime has led to a host of reliability challenges in the last several years. Unlike on-chip caches, which can be efficiently protected using conventional schemes, the general core area is less homogeneous and structured, making tolerating defects a much more challenging problem. Due to the lack of effective solutions, disabling non-functional cores is a common practice in industry to enhance manufacturing yield, which results in a significant reduction in system throughput. Although a faulty core cannot be trusted to correctly execute programs, we observe in this work that for most defects, when starting from a valid architectural state, execution traces on a defective core actually coarsely resemble those of fault-free executions. In light of this insight, we propose a robust and heterogeneous core coupling execution scheme, Necromancer, that exploits a functionally dead core to improve system throughput by supplying hints regarding high-level program behavior. We partition the cores in a conventional CMP system into multiple groups in which each group shares a lightweight core that can be substantially accelerated using these execution hints from a potentially dead core. To prevent this undead core from wandering too far from the correct path of execution, we dynamically resynchronize architectural state with the lightweight core. For a 4-core CMP system, on average, our approach enables the coupled core to achieve 78.5% of the performance of a fully functioning core. This defect tolerance and throughput enhancement comes at modest area and power overheads of 5.3% and 8.5%, respectively.

[1] Irwin L. Kellner. TURN DOWN THE HEAT , 1995 .

[2] Eric Rotenberg,et al. A study of slipstream processors , 2000, MICRO 33.

[3] Doug Burger,et al. Exploiting microarchitectural redundancy for defect tolerance , 2003, Proceedings 21st International Conference on Computer Design.

[4] Lisa Spainhower,et al. Commercial fault tolerance: a tale of two systems , 2004, IEEE Transactions on Dependable and Secure Computing.

[5] Shantanu Gupta,et al. Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.

[6] David García,et al. NonStop/spl reg/ advanced architecture , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[7] Sule Ozev,et al. A mechanism for online diagnosis of hard faults in microprocessors , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[8] Norman P. Jouppi,et al. Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9] Babak Falsafi,et al. Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[10] Scott A. Mahlke,et al. BulletProof: a defect-tolerant CMP switch architecture , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[11] Sanjay J. Patel,et al. Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[12] James E. Smith,et al. Configurable isolation: building high availability systems with commodity multi-core processors , 2007, ISCA '07.

[13] Todd M. Austin,et al. SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[14] Alaa R. Alameldeen,et al. Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[15] Amin Ansari,et al. ZerehCache: Armoring cache architectures in high defect density technologies , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16] Sanjay J. Patel,et al. Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[17] Pradip Bose,et al. Exploiting structural duplication for lifetime reliability enhancement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[18] Todd M. Austin,et al. DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[19] Sule Ozev,et al. Tolerating hard faults in microprocessor array structures , 2004, International Conference on Dependable Systems and Networks, 2004.

[20] Huiyang Zhou,et al. Dual-core execution: building a highly scalable single-thread instruction window , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[21] Amin Ansari,et al. The StageNet fabric for constructing resilient multicore systems , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[22] Norman P. Jouppi,et al. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[23] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[24] Josep Torrellas,et al. Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[25] Jaume Abella,et al. Low Vccmin fault-tolerant cache with highly predictable performance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26] Albert Meixner,et al. Argus: Low-Cost, Comprehensive Error Detection in Simple Cores , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[27] Lisa Spainhower,et al. IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective , 1999, IBM J. Res. Dev..

[28] Gurindar S. Sohi,et al. Master/slave speculative parallelization , 2002, MICRO.

[29] Albert Meixner,et al. A: L-C, C E D S C , 2008 .

[30] Frank Vahid,et al. A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[31] Doug Burger,et al. Exploiting microarchitectural redundancy for defect tolerance , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[32] Sarita V. Adve,et al. Accurate microarchitecture-level fault modeling for studying hardware faults , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[33] Daniel J. Sorin,et al. Core Cannibalization Architecture: Improving lifetime chip performance for multicore processors in the presence of hard faults , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34] Margaret Martonosi,et al. Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[35] Richard J. Carter,et al. Defect tolerance on the Teramac custom computer , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[36] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[37] Sanjay J. Patel,et al. Beating in-order stalls with "flea-flicker" two-pass pipelining , 2006, IEEE Transactions on Computers.

[38] T. N. Vijaykumar,et al. Rescue: a microarchitecture for testability and defect tolerance , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[39] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.

[40] Shekhar Y. Borkar,et al. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.