Maximizing Yield per Area of Highly Parallel CMPs Using Hardware Redundancy

The manufacturing yield of chip multiprocessors (CMPs) has become a significant problem as more transistors are integrated onto a single die and the defect rate keeps increasing for “end-of-Moore” nano-scale CMOS technologies. Since such CMP designs usually have significant structural symmetry, adding spare copies to these should be an effective method for increasing yield per area, as is the case for memories. However, a systematic approach to add spare copies to optimize CMP yield per area has never been developed, primarily due to the lack of: 1) a general model of CMP architectures and 2) a practically-useable model for computing areas of chip versions with different configurations of spare copies. This paper develops such models and, in conjunction with a systematic approach for enumerating a wide range of spare configurations, uses these to compute the area overhead and yield for each configuration. In particular, this paper proposes a general spare cores sharing technique to maximize yield per area of any CMP by efficiently traversing the design space for adding spare cores. Experimental results show that the advantage of the proposed approach over traditional approaches increases with continued technology scaling. Specifically, the proposed approach achieves \(2\times \) yields per area over previous approaches for 32 nm and 22 nm technologies. Also, the obtained yield per area values provided by our approach are around 70% of that obtained for the ideal scenario where defect density is zero and no redundancy is added.

[1]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Ramesh Karri,et al.  Improving GPU Robustness by making use of faulty parts , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[3]  Sandeep K. Gupta,et al.  Salvaging chips with caches beyond repair , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Fadi N. Sibai Thermal, power, and performance shaping of multicore floorplans , 2010, 2010 International Conference on Microelectronics.

[5]  Sandeep K. Gupta,et al.  Optimizing redundancy design for chip-multiprocessors for flexible utility functions , 2013, Proceedings of the 2013 25th International Teletraffic Congress (ITC).

[6]  J. Jopling,et al.  High performance 32nm logic technology featuring 2nd generation high-k + metal gate transistors , 2009, 2009 IEEE International Electron Devices Meeting (IEDM).

[7]  Israel Koren,et al.  Defect and Fault Tolerance in VLSI Systems , 1989, Springer US.

[8]  Naveed A. Sherwani,et al.  Algorithms for VLSI Physical Design Automation , 1999, Springer US.

[9]  Dean M. Tullsen,et al.  Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[10]  Jin-Xiang Wang,et al.  Exploration of a reconfigurable 2D mesh network-on-chip architecture and a topology reconfiguration algorithm , 2012, 2012 IEEE 11th International Conference on Solid-State and Integrated Circuit Technology.

[11]  Sandeep K. Gupta,et al.  A novel software-based defect-tolerance approach for application-specific embedded systems , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[12]  Hyeran Jeon,et al.  Warped-DMR: Light-weight Error Detection for GPGPU , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Mohammad Mirza-Aghatabar,et al.  Algorithms to maximize yield and enhance yield/area of pipeline circuitry by insertion of switches and redundant modules , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[14]  Puneet Gupta,et al.  Yield Analysis and Optimization , 2008, Handbook of Algorithms for Physical Design Automation.

[15]  Johan Karlsson,et al.  On the probability of detecting data errors generated by permanent faults using time redundancy , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[16]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[17]  Kiran S. Kedlaya,et al.  Fault Tolerance in Multicore Processors With Reconfigurable Hardware Unit , 2006 .

[18]  Doug Burger,et al.  Exploiting microarchitectural redundancy for defect tolerance , 2003, 2012 IEEE 30th International Conference on Computer Design (ICCD).

[19]  David Harris,et al.  CMOS VLSI Design: A Circuits and Systems Perspective , 2004 .

[20]  Qiang Xu,et al.  Defect Tolerance in Homogeneous Manycore Processors Using Core-Level Redundancy with Unified Topology , 2008, 2008 Design, Automation and Test in Europe.

[21]  Ieee Circuits,et al.  IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Sandeep K. Gupta,et al.  A systematic methodology to improve yield per area of highly-parallel CMPs , 2012, 2012 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[23]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[24]  Mineo Kaneko Reconfiguration of folded torus PE networks for fault tolerant WSI implementations , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[25]  Dean M. Tullsen,et al.  Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling , 2005, ISCA 2005.

[26]  John Wawrzynek,et al.  On the opportunity to improve system yield with multi-core architectures , 2007 .

[27]  Glenn H. Chapman,et al.  Defect and Fault Tolerance in VLSI Systems , 2003 .

[28]  Karthikeyan Sankaralingam,et al.  Sampling + DMR: Practical and low-overhead permanent fault detection , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[29]  Kwang-Ting Cheng,et al.  Modeling yield, cost, and quality of an NoC with uniformly and non-uniformly distributed redundancy , 2010, 2010 28th VLSI Test Symposium (VTS).