Thermal prediction and adaptive control through workload phase detection

Elevated die temperature is a true limiter to the scalability of modern processors. With continued technology scaling in order to meet ever-increasing performance demands, it is no longer cost effective to design cooling systems that handle the worst-case thermal behaviors. Instead, cooling systems are designed to handle typical chip operation, while processors must detect and handle rare thermal emergencies. Most processors rely on measurements from integrated thermal sensors and dynamic thermal management (DTM) techniques in order to manage the trade-off between performance and thermal risk. Optimal management requires advanced knowledge of the thermal trajectory based on the current workload behaviors and operating conditions. In this work, we devise novel workload phase classification strategies that automatically discriminate among workload behaviors with respect to the thermal control response. We incorporate workload phase-detection and thermal models into a dynamic voltage and frequency scaling (DVFS) technique that can optimally control temperature during runtime based on thermal predictions. We demonstrate the effectiveness of our proposed techniques in predicting and adaptively controlling the thermal behavior of a real quad-core processor in response to a wide range of workloads. In comparison with state-of-the-art model predictive control (MPC) techniques in previous works on thermal prediction, we demonstrate a 5.8% improvement in instruction throughput with the same number of thermal violations. In comparison with simple proportional-integral (PI) feedback control techniques, we improve instruction throughput by 3.9%, while significantly reducing the number of thermal violations.

[1]  David Atienza,et al.  Neural network based on-chip thermal simulator , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[2]  Tajana Rosing,et al.  Proactive temperature balancing for low cost thermal management in MPSoCs , 2008, ICCAD 2008.

[3]  C. M. Krishna,et al.  Temptor: A Lightweight Runtime Temperature Monitoring Tool Using Performance Counters , 2006 .

[4]  Luca Benini,et al.  Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller , 2013, IEEE Transactions on Parallel and Distributed Systems.

[5]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[6]  Shahin Nazarian,et al.  Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods , 2006, Proceedings of the IEEE.

[7]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[8]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Luca Benini,et al.  A distributed and self-calibrating model-predictive controller for energy and thermal management of high-performance multicores , 2011, 2011 Design, Automation & Test in Europe.

[10]  Wei Wu,et al.  Efficient power modeling and software thermal sensing for runtime temperature monitoring , 2007, TODE.

[11]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[12]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Tao Li,et al.  Complexity-based program phase analysis and classification , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[14]  E. Cohen,et al.  Hotspot-Limited Microprocessors: Direct Temperature and Power Distribution Measurements , 2007, IEEE Journal of Solid-State Circuits.

[15]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[16]  Li Shang,et al.  System-Level Dynamic Thermal Management for High-Performance Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  R. Mukherjee,et al.  Physical Aware Frequency Selection for Dynamic Thermal Management in Multi-Core Systems , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[18]  Eun Jung Kim,et al.  Hybrid dynamic thermal management based on statistical characteristics of multimedia applications , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[19]  Eun Jung Kim,et al.  Predictive dynamic thermal management for multicore systems , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[20]  Tajana Simunic,et al.  TempoMP: Integrated prediction and management of temperature in heterogeneous MPSoCs , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[21]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[22]  David Atienza,et al.  Fast thermal simulation of 2D/3D integrated circuits exploiting neural networks and GPUs , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[23]  Tajana Simunic,et al.  Proactive temperature balancing for low cost thermal management in MPSoCs , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[24]  Kevin Skadron,et al.  Predictive Temperature-Aware DVFS , 2010, IEEE Transactions on Computers.

[25]  Sherief Reda,et al.  Consistent runtime thermal prediction and control through workload phase detection , 2010, Design Automation Conference.

[26]  Li Shang,et al.  HybDTM: a coordinated hardware-software approach for dynamic thermal management , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[27]  Giovanni De Micheli,et al.  A control theory approach for thermal balancing of MPSoC , 2009, 2009 Asia and South Pacific Design Automation Conference.

[28]  David Atienza,et al.  Energy-Efficient Multiobjective Thermal Control for Liquid-Cooled 3-D Stacked Architectures , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[29]  David Atienza,et al.  Energy-efficient variable-flow liquid cooling in 3D stacked architectures , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).