Report on power, thermal and reliability prediction for 3D Networks-on-Chip

By combining Three Dimensional Integrated Circuits with the Network-on-Chip infrastructure to obtain 3D Networks-on-Chip (3D-NoCs), the new on-chip communication paradigm brings several advantages on lower power, smaller footprint and lower latency. However, thermal dissipation is one of the most critical challenges for 3D-ICs where the heat cannot easily transfer through several layers of silicon. Consequently, the high-temperature area also confronts the reliability threat as the Mean Time to Failure (MTTF) decreases exponentially with the operating temperature. Apparently, 3D-NoCs must tackle this fundamental problem in order to be widely used. Therefore, in this work, we investigate the thermal distribution and reliability prediction of 3D-NoCs. We first present a new method to help simulate the temperature (both steady and transient) using traffics value from realistic and synthetic benchmarks and the power consumption from standard VLSI design flow. Then, based on the proposed method, we further predict the relative reliability between different parts of the network. Experimental results show that the method has an extremely fast execution time in comparison to the acceleration lifetime test. Furthermore, we compare the thermal behavior and reliability between Monolithic design and TSV-based TSV. We also explorer the ability to implement the thermal via a mechanism to help reduce the operating temperature.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Yi-Chung Chen,et al.  The MTA: An Advanced and Versatile Thermal Simulator for Integrated Systems , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Seung Eun Lee,et al.  A high level power model for Network-on-Chip (NoC) router , 2009, Comput. Electr. Eng..

[4]  Ieee Staff 2014 51St Acm Edac Ieee Design Automation Conference (Dac) , 2014 .

[5]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  David Atienza,et al.  3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[7]  Ben A. Abderazek,et al.  A low-overhead soft–hard fault-tolerant architecture, design and management scheme for reliable high-performance many-core 3D-NoC systems , 2016, The Journal of Supercomputing.

[8]  昌约克·帕克 Dummy TSV to improve process uniformity and heat dissipation , 2011 .

[9]  Ben A. Abderazek,et al.  2D-PPC: A single-correction multiple-detection method for Through-Silicon-Via Faults , 2019, 2019 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[10]  J. B. Bowles,et al.  A survey of reliability-prediction procedures for microelectronic devices , 1992 .

[11]  Jie Meng,et al.  Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints , 2012, DAC Design Automation Conference 2012.

[12]  Anthony Collins,et al.  A Heterogeneous 3D-IC Consisting of Two 28 nm FPGA Die and 32 Reconfigurable High-Performance Data Converters , 2014, IEEE Journal of Solid-State Circuits.

[13]  Patrick D.T. O'Connor,et al.  The Reliability Handbook , 1983 .

[14]  Xuan-Tu Tran,et al.  FPGA Implementation of a Low Latency and High Throughput Network-on-Chip Router Architecture , 2011 .

[15]  Khanh N. Dang,et al.  TSV-OCT: A Scalable Online Multiple-TSV Defects Localization for Real-Time 3-D-IC Systems , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[16]  アクラム ベン アメド High-throughput Architecture and Routing Algorithms Towards the Design of Reliable Mesh-based Many-Core Network-on-Chip Systems , 2015 .

[17]  Khanh N. Dang,et al.  A low-overhead fault tolerant technique for TSV-based interconnects in 3D-IC systems , 2017, 2017 18th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA).

[18]  S. Wong,et al.  Monolithic 3D Integrated Circuits , 2007, 2007 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA).

[19]  Akram Ben Ahmed,et al.  An on-Communication Multiple-TSV Defects Detection and Localization for Real-Time 3D-ICs , 2019, 2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC).

[20]  Khanh N. Dang,et al.  2D Parity Product Code for TSV online fault correction and detection , 2020 .

[21]  Sarita V. Adve,et al.  AS SCALING THREATENS TO ERODE RELIABILITY STANDARDS, LIFETIME RELIABILITY MUST BECOME A FIRST-CLASS DESIGN CONSTRAINT. MICROARCHITECTURAL INTERVENTION OFFERS A NOVEL WAY TO MANAGE LIFETIME RELIABILITY WITHOUT SIGNIFICANTLY SACRIFICING COST AND PERFORMANCE , 2005 .

[22]  Luca Benini,et al.  Design Issues and Considerations for Low-Cost 3-D TSV IC Technology , 2010, IEEE Journal of Solid-State Circuits.

[23]  Akram Ben Ahmed,et al.  TSV-IaS: Analytic Analysis and Low-Cost Non-Preemptive on-Line Detection and Correction Method for TSV Defects , 2019, 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[24]  Kevin Skadron,et al.  HotSpot 6.0: Validation, Acceleration and Extension , 2015 .

[25]  José Ignacio Hidalgo,et al.  Thermal-aware floorplanner for 3D IC, including TSVs, liquid microchannels and thermal domains optimization , 2015, Appl. Soft Comput..

[26]  Kaustav Banerjee,et al.  3-D ICs: a novel chip design for improving deep-submicrometer interconnect performance and systems-on-chip integration , 2001, Proc. IEEE.

[27]  Pascal Vivet,et al.  Power Modeling in SystemC at Transaction Level, Application to a DVFS Architecture , 2008, 2008 IEEE Computer Society Annual Symposium on VLSI.

[28]  Abderazek Ben Abdallah,et al.  Scalable Design Methodology and Online Algorithm for TSV-Cluster Defects Recovery in Highly Reliable 3D-NoC Systems , 2017, IEEE Transactions on Emerging Topics in Computing.

[29]  Andrew B. Kahng,et al.  ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[30]  Sung Kyu Lim,et al.  Fast and accurate thermal modeling and optimization for monolithic 3D ICs , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Khanh N. Dang,et al.  Parity-Based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication , 2018, 2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC).

[32]  Khanh N. Dang,et al.  A Comprehensive Reliability Assessment of Fault-Resilient Network-on-Chip Using Analytical Model , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[33]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34]  Ben A. Abderazek,et al.  Soft-error resilient 3D Network-on-Chip router , 2015, 2015 IEEE 7th International Conference on Awareness Science and Technology (iCAST).

[35]  Seung Eun Lee,et al.  A variable frequency link for a power-aware network-on-chip (NoC) , 2009, Integr..

[36]  Ben A. Abderazek,et al.  Soft-error resilient Network-on-Chip for safety-critical applications , 2016, 2016 International Conference on IC Design and Technology (ICICDT).

[37]  Jeong-A Lee,et al.  Thermal Analysis for 3D Multi-core Processors with Dynamic Frequency Scaling , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[38]  J. Black Mass transport of aluminum by momentum exchange with conducting electrons , 1967, 2005 IEEE International Reliability Physics Symposium, 2005. Proceedings. 43rd Annual..