Hot Spot Identification and System Parameterized Thermal Modeling for Multi-Core Processors Through Infrared Thermal Imaging

Accurate thermal models suitable for system level dynamic thermal, power and reliability regulation and management are vital for many commercial multi-core processors. However, developing such accurate thermal models and identifying the related thermal-power relevant spatial locations for commercial processors is a challenging task due to the lack of information and available tools. Existing tools such as HotSpot-like thermal models may suffer from inaccuracy or inefficiency for online applications, primarily because most rely on parameters that cannot be precisely quantified, such as power-traces, while others are numerical methods not suitable for runtime use. In this work, we propose a novel approach to automatically detecting the major heat-sources on a commercial multi-core microprocessor using an infrared thermal imaging setup. Our approach involves a number of steps including 2D discrete cosine transformation filter for noise reduction on the measured thermal maps, and Laplacian transformation followed by K-mean clustering for heat-source identification. Since the identified heat-sources are the thermally vulnerable areas of the die, we propose a novel approach to deriving a thermal model capable of predicting their temperatures during runtime. We apply Long-Short-Term-Memory (LSTM) networks to build a dynamic thermal model which uses system-level variables such as chip frequency, voltage and instruction count as inputs. The model is trained and tested exclusively using measured thermal data from a commercial multi-core processor. Experimental results show that the proposed thermal model achieves very high accuracy (root-mean-square-error: 2.04°C to 2.57° C) in predicting the temperature of all the identified heat-sources on the chip.

[1]  Michael Taylor A landscape of the new dark silicon design regime , 2013 .

[2]  Gerhard Wachutka,et al.  Rigorous model and network for transient thermal problems , 2002 .

[3]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[4]  Sarma B. K. Vrudhula,et al.  Energy-Efficient Operation of Multicore Processors by DVFS, Task Migration, and Active Cooling , 2014, IEEE Transactions on Computers.

[5]  Sheldon X.-D. Tan,et al.  Compact thermal modeling for packaged microprocessor design with practical power maps , 2014, Integr..

[6]  Sheldon X.-D. Tan,et al.  A power-driven thermal sensor placement algorithm for dynamic thermal management , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Sheldon X.-D. Tan,et al.  Full-chip runtime error-tolerant thermal estimation and prediction for practical thermal management , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[8]  Wei Wu,et al.  Efficient power modeling and software thermal sensing for runtime temperature monitoring , 2007, TODE.

[9]  Kevin Skadron,et al.  Recent thermal management techniques for microprocessors , 2012, CSUR.

[10]  Kevin Skadron,et al.  Using performance counters for runtime temperature sensing in high-performance processors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11]  Sheldon X.-D. Tan,et al.  General Parameterized Thermal Modeling for High-Performance Microprocessor Design , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  A. Abdel-azim Fundamentals of Heat and Mass Transfer , 2011 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Sherief Reda,et al.  Power mapping and modeling of multi-core processors , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[15]  Yogendra Joshi,et al.  A compact approach to on-chip interconnect heat conduction modeling using the finite element method , 2008 .

[16]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[17]  Sheldon X.-D. Tan,et al.  Parameterized architecture-level dynamic thermal models for multicore microprocessors , 2010, TODE.

[18]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Li Shang,et al.  ISAC: Integrated Space-and-Time-Adaptive Chip-Package Thermal Analysis , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[21]  Sheldon X.-D. Tan,et al.  Task Migrations for Distributed Thermal Management Considering Transient Effects , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Jian Ma,et al.  Hierarchical Dynamic Thermal Management Method for High-Performance Many-Core Microprocessors , 2016, ACM Trans. Design Autom. Electr. Syst..

[23]  Kevin Skadron,et al.  Predictive Temperature-Aware DVFS , 2010, IEEE Transactions on Computers.

[24]  Jörg Henkel,et al.  Lucid infrared thermography of thermally-constrained processors , 2015, 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).