Trading off area, yield and performance via hybrid redundancy in multi-core architectures

Manufacturing yield is a major concern for modern CMOS technologies. Fortunately, evolving chip architectures such as multi-cores have provided new venues for yield enhancement, and calls for a fresh perspective on the classic method of redundancy insertion. In this paper we outline a new approach towards redundancy insertion in modern multi-core CPU architectures. Traditionally, applying redundancy at a finer intra-core level of granularity provides great benefits in yield improvement, but requires additional steering logic and wiring that has a detrimental impact on area and performance. At the other end of the spectrum, coarse-grained core level redundancy can enable spare sharing, but it is only beneficial in highly-parallel GPU architectures. To this end, we will 1) introduce a hybrid spare sharing redundancy insertion scheme that combines the advantages of the above two approaches, while carefully leveraging the associated area and performance overheads, 2) present an extensively verified, systematic scalable model to evaluate the quality of the final design in terms of projected revenue per wafer, and 3) introduce a maximization algorithm to determine the near optimal redundancy configurations during the design stage. Experimental results show that our new design methodology provides more than 15% improvement in revenue per wafer, compared to using existing redundancy insertion techniques.

[1]  Necromancer: enhancing system throughput by animating dead cores , 2010, ISCA '10.

[2]  Sandeep K. Gupta,et al.  A systematic methodology to improve yield per area of highly-parallel CMPs , 2012, 2012 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[3]  Yue Gao,et al.  A new paradigm for trading off yield, area and performance to enhance performance per wafer , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Szu-Liang Chen,et al.  The 65nm 16MB On-Die L3 Cache for a Dual Core Multi-Threaded Xeon/sup ~/ Processor , 2006, 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers..

[5]  James E. Smith,et al.  Configurable isolation: building high availability systems with commodity multi-core processors , 2007, ISCA '07.

[6]  Shantanu Gupta,et al.  Architectural core salvaging in a multi-core processor for hard-error tolerance , 2009, ISCA '09.

[7]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[8]  Mohammad Mirza-Aghatabar,et al.  SIRUP: Switch Insertion in RedUndant Pipeline Structures for Yield and Yield/Area Improvement , 2009, 2009 Asian Test Symposium.

[9]  Dharma P. Agrawal,et al.  Improving scheduling of tasks in a heterogeneous environment , 2004, IEEE Transactions on Parallel and Distributed Systems.

[10]  Mohammad Mirza-Aghatabar,et al.  Algorithms to maximize yield and enhance yield/area of pipeline circuitry by insertion of switches and redundant modules , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[11]  D. Weiss,et al.  The on-chip 3 MB subarray based 3rd level cache on an Itanium microprocessor , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[12]  Coniferous softwood GENERAL TERMS , 2003 .

[13]  Melvin A. Breuer,et al.  Reduction of detected acceptable faults for yield improvement via error-tolerance , 2007 .

[14]  Doug Burger,et al.  Exploiting microarchitectural redundancy for defect tolerance , 2003, Proceedings 21st International Conference on Computer Design.

[15]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[16]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[17]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[18]  Sandeep K. Gupta,et al.  Salvaging chips with caches beyond repair , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Qiang Xu,et al.  Defect Tolerance in Homogeneous Manycore Processors Using Core-Level Redundancy with Unified Topology , 2008, 2008 Design, Automation and Test in Europe.

[20]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .