Variability-Aware Thermal Simulation using CNNs

With rising power densities in modern-day electronic systems, temperature has emerged as a fundamental design constraint. This has led to the advent of a range of thermal-aware design and runtime management techniques. However, such techniques are heavily dependent on a fast and accurate thermal modeling method. These methods need to account for manufacturing variability, that significantly impacts the chip's power and performance. Similarly, leakage power too contributes to a substantial portion of the total power. Thus a thermal modeling method can be accurate only if it is capable of incorporating the effects of process variation as well as leakage power. In this paper, we propose a simple and elegant residual convolutional neural network for thermal estimation in the presence of variability, which leverages the physics of heat transfer. Our approach is capable of modeling modern-day 3D chips with microchannels and incorporates accurate leakage power models. To enable ultra-fast thermal estimation, we implement our technique on a GPU. Our experiments show that our technique is orders of magnitude faster than the state-of-the-art with a similar, if not better, accuracy. The mean absolute error using our technique is 0.61°C, for a maximum temperature rise of 67.5°C (0.9%).

[1]  Shekhar Y. Borkar,et al.  Low power design challenges for the decade (invited talk) , 2001, ASP-DAC '01.

[2]  Gokhan Memik,et al.  Machine Learning-Based Temperature Prediction for Runtime Thermal Management Across System Components , 2018, IEEE Transactions on Parallel and Distributed Systems.

[3]  Kevin Skadron,et al.  Differentiating the roles of IR measurement and simulation for power and temperature-aware design , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[4]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  David Atienza,et al.  3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[6]  Shekhar Borkar,et al.  Low power design challenges for the decade , 2001, Proceedings of the ASP-DAC 2001. Asia and South Pacific Design Automation Conference 2001 (Cat. No.01EX455).

[7]  Jörg Henkel,et al.  Machine Learning Based Online Full-Chip Heatmap Estimation , 2020, 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC).

[8]  Sparsh Mittal A Survey of Architectural Techniques for Managing Process Variation , 2016, ACM Comput. Surv..

[9]  Je-Hyoung Park,et al.  Experimental validation of the power blurring method , 2010, 2010 26th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).

[10]  Smruti R. Sarangi,et al.  LightSim: A leakage aware ultrafast temperature simulator , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[11]  Smruti R. Sarangi,et al.  A Fast Leakage-Aware Green’s-Function-Based Thermal Simulator for 3-D Chips , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  David Atienza,et al.  Neural Network-Based Thermal Simulation of Integrated Circuits on GPUs , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Smruti R. Sarangi,et al.  A fast leakage aware thermal simulator for 3D chips , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[14]  Jörg Henkel,et al.  HotSniper: Sniper-Based Toolchain for Many-Core Thermal Simulations in Open Systems , 2019, IEEE Embedded Systems Letters.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Sheldon X.-D. Tan,et al.  Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method , 2012, Thirteenth International Symposium on Quality Electronic Design (ISQED).

[18]  Ali Soleimani,et al.  Novel Feature Selection Algorithm for Thermal Prediction Model , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  J. Torrellas,et al.  VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects , 2008, IEEE Transactions on Semiconductor Manufacturing.

[20]  Smruti R. Sarangi,et al.  A Survey of Chip-level Thermal Simulators , 2019, ACM Comput. Surv..

[21]  Diana Marculescu,et al.  A learning-based autoregressive model for fast transient thermal analysis of chip-multiprocessors , 2012, 17th Asia and South Pacific Design Automation Conference.

[22]  Mohab Anis,et al.  Statistical Thermal Profile Considering Process Variations: Analysis and Applications , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Zhuo Feng,et al.  Fast thermal analysis on GPU for 3D-ICs with integrated microchannel cooling , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).