Transient and Peak Temperature Computation Based on Matrix Exponentials (MatEx)

Runtime/design-time management decisions, such as mapping new application tasks/threads to cores, migrating tasks/threads among cores, scheduling tasks in individual cores, activating/deactivating cores, changing the Dynamic Voltage and Frequency Scaling (DVFS) levels, etc., are typically used by resource management techniques to optimize the usage of the available resources. Among the existing techniques in the literature, there are several power budgeting and thermal management techniques that are derived/formulated for the steady-state temperatures. Nevertheless, management decisions change the power consumption throughout the chip, and this can in turn result in transient temperatures which are much higher than any expected steady-state scenarios. If this occurs and the transient temperatures are higher than the critical threshold temperature, some Dynamic Thermal Management (DTM) technique would be activated on the chip to guarantee that it is not damaged. However, very frequent triggers of aggressive DTM techniques may degrade the overall performance of the system in an unpredictable manner (from the perspective of the resource management techniques). Most importantly, chips could also be seriously damaged if in some case the transient temperatures grow at a faster rate than the speed in which DTM can react to them. In order for the system to operate in thermally safe ranges and have a predictable behavior, resource management techniques could thus benefit from evaluating (i.e., estimating or predicting) such transient temperature peaks when making management decisions. In this chapter, we introduce a lightweight and accurate method for computing the peaks in transient temperatures at runtime. Our technique, called MatEx, is suitable for any compact thermal model that consist in a system of first-order differential equations, e.g., a thermal model based on RC thermal networks (like the one used by HotSpot). Most existing state-of-the-art techniques/tools for temperature computation/estimation/prediction use standard numerical methods to solve such a system of first-order differential equations. Although some of these techniques are reasonably efficient, they are not suitable to only compute the peaks in temperature during the transient state, and therefore these peaks must be extracted from extensive simulations for many time steps, taking sometimes several seconds to compute. Contrarily, MatEx is based on an analytical solution using matrix exponentials and linear algebra, that results in a mathematical expression which can be easily analyzed and differentiated in order to only compute the peaks in transient temperatures. Moreover, given that MatEx is based on an exact solution which is a function of time, it can also be used to efficiently compute any future transient temperatures without accuracy losses, making it able to potentially replace existing temperature estimation tools.

[1]  Benjamin C. Kuo,et al.  AUTOMATIC CONTROL SYSTEMS , 1962, Universum:Technical sciences.

[2]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  J. Frédéric Bonnans,et al.  Numerical Optimization: Theoretical and Practical Aspects (Universitext) , 2006 .

[4]  Muhammad Shafique,et al.  Thermal-aware power budgeting for dark silicon chips , 2015, 2015 Sixth International Green and Sustainable Computing Conference (IGSC).

[5]  Charlie Chung-Ping Chen,et al.  3-D Thermal-ADI: a linear-time chip level transient thermal simulator , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[6]  Muhammad Shafique,et al.  MatEx: Efficient transient and peak temperature computation for compact thermal models , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Sung-Mo Kang,et al.  Power Blurring: Fast Static and Transient Thermal Analysis Method for Packaged Integrated Circuits and Power Devices , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Yu Hu,et al.  Thermal-sustainable power budgeting for dynamic threading , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Andrzej Napieralski,et al.  Logi-thermal simulation of digital CMOS ICs with emphasis on dynamic power dissipation , 2009, 2009 MIXDES-16th International Conference Mixed Design of Integrated Circuits & Systems.

[10]  Heba Khdr,et al.  Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Vanchinathan Venkataramani,et al.  Hierarchical power management for asymmetric multi-core in dark silicon era , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Yu-Min Lee,et al.  Full-Chip Thermal Analysis for the Early Design Stage via Generalized Integral Transforms , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Lothar Thiele,et al.  Power agnostic technique for efficient temperature estimation of multicore embedded systems , 2012, CASES '12.

[14]  Muhammad Shafique,et al.  Variability-aware dark silicon management in on-chip many-core systems , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Sheldon X.-D. Tan,et al.  Composable thermal modeling and simulation for architecture-level thermal designs of multicore microprocessors , 2013, TODE.

[16]  Jose Renau,et al.  ESESC: A fast multicore simulator using Time-Based Sampling , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).