Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore Chip

It becomes increasingly difficult to achieve a high manufacturing yield for multicore chips due to larger chip sizes, higher device densities, and greater failure rates. By adding a limited number of spare cores and wires to replace defective cores and wires either before shipment or in the field, the effective yield of the chip and its overall cost can be significantly improved. In this paper, we first model the yield of a multicore chip that incorporates both spare cores and spare wires. Then, we propose a quality metric for an NoC, and model the system yield subject to a given quality constraint. We also model the manufacturing and service costs of a multicore chip and show that a spare scheme can significantly improve the quality, increase the yield, reduce the overall cost, and substitute for the burn-in process. We illustrate that, in a spare-enhance system on a chip with high-quality in-field recovery capability, the reliance on high quality manufacturing testing can be significantly reduced. We also demonstrate that the overall quality of a mesh-based NoC depends more on the reliability of the inner links than the outer links; therefore, nonuniform spare wire distribution is sometimes more effective and cost efficient than a uniform approach.

[1]  Kwang-Ting Cheng,et al.  Yield and Cost Analysis of a Reliable NoC , 2009, 2009 27th IEEE VLSI Test Symposium.

[2]  Kwang-Ting Cheng,et al.  Time-Multiplexed Online Checking: A Feasibility Study , 2008, 2008 17th Asian Test Symposium.

[3]  A.W. Righter,et al.  CMOS IC reliability indicators and burn-in economics , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[5]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[6]  Jung-Hsien Chiang,et al.  Neural and Fuzzy Methods in Handwriting Recognition , 1997, Computer.

[7]  Kwang-Ting Cheng Embedded Software-Based Self-Testing for SoC Design , 2005, Embedded Systems Handbook.

[8]  Pasi Liljeberg,et al.  Self-Timed NoC Links Using Combinations of Fault Tolerance Methods , .

[9]  K.-T. Cheng,et al.  Embedded software-based self-testing for SoC design , 2002, Proceedings 2002 Design Automation Conference (IEEE Cat. No.02CH37324).

[10]  T. G. Noll,et al.  Quantitative Cost Modeling of Error Protection for Network-on-Chip , .

[11]  Cecilia Metra,et al.  Configurable Error Control Scheme for NoC Signal Integrity , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[12]  Yasuo Kawahara,et al.  Introducing redundancy in field programmable gate arrays , 1993, Proceedings of IEEE Custom Integrated Circuits Conference - CICC '93.

[13]  Yervant Zorian,et al.  On-Line Testing for VLSI—A Compendium of Approaches , 1998, J. Electron. Test..

[14]  Vishwani D. Agrawal,et al.  Reducing the complexity of defect level modeling using the clustering effect , 2000, DATE '00.

[15]  Axel Jantsch,et al.  Networks on chip , 2003 .

[16]  J. A. Cunningham The use and evaluation of yield models in integrated circuit manufacturing , 1990 .

[17]  T. Dumitras,et al.  Towards on-chip fault-tolerant communication , 2003, Proceedings of the ASP-DAC Asia and South Pacific Design Automation Conference, 2003..

[18]  Sujit Dey,et al.  Software-based self-testing methodology for processor cores , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[19]  Timothy R. Henry,et al.  Burn-in elimination of a high volume microprocessor using I/sub DDQ/ , 1996, Proceedings International Test Conference 1996. Test and Design Validity.

[20]  Yervant Zorian,et al.  Built in self repair for embedded high density SRAM , 1998, Proceedings International Test Conference 1998 (IEEE Cat. No.98CH36270).

[21]  Kwang-Ting Cheng,et al.  A Cost Analysis Framework for Multi-core Systems with Spares , 2008, 2008 IEEE International Test Conference.

[22]  Karthikeyan Sankaralingam,et al.  On-Chip Interconnection Networks of the TRIPS Chip , 2007, IEEE Micro.

[23]  Yervant Zorian,et al.  Application and analysis of rt-level software-based self-testing for embedded processor cores , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[24]  R. Kawahara,et al.  The effectiveness of IDDQ and high voltage stress for burn-in elimination [CMOS production] , 1996, Digest of Papers 1996 IEEE International Workshop on IDDQ Testing.

[25]  Melvin A. Breuer,et al.  An error-oriented test methodology to improve yield with error-tolerance , 2006, 24th IEEE VLSI Test Symposium.

[26]  Ming He,et al.  Multi-path Routing for Mesh/Torus-Based NoCs , 2007, Fourth International Conference on Information Technology (ITNG'07).

[27]  Israel Koren,et al.  Defect tolerance in VLSI circuits: techniques and yield analysis , 1998, Proc. IEEE.

[28]  Melvin A. Breuer,et al.  Roving Emulation as a Fault Detection Mechanism , 1986, IEEE Transactions on Computers.

[29]  Kaushik Roy,et al.  Test consideration for nanometer-scale CMOS circuits , 2006, IEEE Design & Test of Computers.

[30]  Thomas J. Anderson,et al.  The impact of multiple failure modes on estimating product field reliability , 2006, IEEE Design & Test of Computers.

[31]  Israel Koren,et al.  A Unified Negative-Binomial Distribution for Yield Analysis of Defect-Tolerant Circuits , 1993, IEEE Trans. Computers.

[32]  B. Pittel On spreading a rumor , 1987 .

[33]  S. Asano,et al.  The design and implementation of a first-generation CELL processor - a multi-core SoC , 2005, 2005 International Conference on Integrated Circuit Design and Technology, 2005. ICICDT 2005..

[34]  J. Lach,et al.  IC modeling for yield-aware design with variable defect rates , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[35]  Janet Wu,et al.  Testing of Vega2, a chip multi-processor with spare processors. , 2007, 2007 IEEE International Test Conference.

[36]  Sudhakar M. Reddy,et al.  On the Design of Fault-Tolerant Two-Dimensional Systolic Arrays for Yield Enhancement , 1989, IEEE Trans. Computers.

[37]  Paul Ampadu,et al.  A Flexible Parallel Simulator for Networks-on-Chip With Error Control , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[38]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[39]  Manoj Sachdev Deep sub-micron I/sub DDQ/ testing: issues and solutions , 1997, Proceedings European Design and Test Conference. ED & TC 97.

[40]  Kwang-Ting Cheng,et al.  Error-locality-aware linear coding to correct multi-bit upsets in SRAMs , 2010, 2010 IEEE International Test Conference.

[41]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[42]  Way Kuo,et al.  An overview of manufacturing yield and reliability modeling for semiconductor products , 1999, Proc. IEEE.

[43]  Norman P. Jouppi,et al.  Heterogeneous chip multiprocessors , 2005, Computer.

[44]  Kaushik Roy,et al.  Test challenges for deep sub-micron technologies , 2000, Proceedings - Design Automation Conference.

[45]  R. G. Nelson,et al.  Laser programmable redundancy and yield improvement in a 64K DRAM , 1981 .

[46]  John P. Hayes,et al.  Online BIST for Embedded Systems , 1998, IEEE Des. Test Comput..