Near-Optimal Thermal Monitoring Framework for Many-Core Systems-on-Chip

Chip designers place on-chip thermal sensors to measure local temperatures, thus preventing thermal runaway situations in many-core processing architectures. However, the quality of the thermal reconstruction is directly dependent on the number of placed sensors, which should be minimized, while guaranteeing full detection of all the worst case temperature gradient. In this paper, we present an entire framework for the thermal management of complex many-core architectures, such that we can precisely recover the thermal distribution from a minimal number of sensors. The proposed sensor placement algorithm is guaranteed to reduce the impact of noisy measurements on the reconstructed thermal distribution. We achieve significant improvements compared to the state of the art, in terms of both computational complexity and reconstruction precision. For example, if we consider a 64 cores systems-on-chip with 64 noisy sensors (σ2 = 4), we achieve an average reconstruction error of 1:5°C, that is less than half of what previous state-of-the-art methods achieve. We also study the practical limits of the proposed method and show that we do not need realistic workloads to learn the model and efficiently place the sensors. In fact, we show that the reconstruction error is not significantly increased if we randomly generate the power-traces of the components or if we have just a part of the correct workload.

[1]  David Atienza,et al.  EigenMaps: Algorithms for optimal thermal maps extraction and sensor placement on multicore processors , 2012, DAC Design Automation Conference 2012.

[2]  Martin Vetterli,et al.  DASS: Distributed Adaptive Sparse Sensing , 2013, IEEE Transactions on Wireless Communications.

[3]  Martin Vetterli,et al.  Near-Optimal Sensor Placement for Linear Inverse Problems , 2013, IEEE Transactions on Signal Processing.

[4]  David Atienza,et al.  3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[5]  Luca Benini,et al.  P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[6]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Yufu Zhang,et al.  Adaptive and autonomous thermal tracking for high performance computing systems , 2010, Design Automation Conference.

[8]  Kevin Skadron,et al.  Using performance counters for runtime temperature sensing in high-performance processors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[9]  Seda Ogrenci Memik,et al.  Thermal monitoring mechanisms for chip multiprocessors , 2008, TACO.

[10]  Sherief Reda,et al.  Improved Thermal Tracking for Processors Using Hard and Soft Sensor Allocation Techniques , 2011, IEEE Transactions on Computers.

[11]  Chen-Yong Cher,et al.  Temperature Variation Characterization and Thermal Management of Multicore Architectures , 2009, IEEE Micro.

[12]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[13]  Tajana Simunic,et al.  Accurate Direct and Indirect On-Chip Temperature Sensing for Efficient Dynamic Thermal Management , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  A. J. Weger,et al.  Development of a flexible chip infrared (IR) thermal imaging system for product qualification , 2012, 2012 28th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).

[15]  Jose Renau,et al.  Thermal-aware sampling in architectural simulation , 2012, ISLPED '12.

[16]  Li Shang,et al.  System-Level Dynamic Thermal Management for High-Performance Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Seda Ogrenci Memik,et al.  Systematic temperature sensor allocation and placement for microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[18]  김도형 ANSYS CFX의 난류모델 , 2006 .

[19]  Jose Renau,et al.  ESESC: A fast multicore simulator using Time-Based Sampling , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[20]  Kevin Skadron,et al.  Many-core design from a thermal perspective , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[21]  Sherief Reda,et al.  Thermal monitoring of real processors: Techniques for sensor allocation and full characterization , 2010, Design Automation Conference.

[22]  Russell Tessier,et al.  Collaborative calibration of on-chip thermal sensors using performance counters , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23]  Chen-Yong Cher,et al.  Variation-aware thermal characterization and management of multi-core architectures , 2008, 2008 IEEE International Conference on Computer Design.

[24]  Abhimanyu Das,et al.  Algorithms for subset selection in linear regression , 2008, STOC.

[25]  Jose Renau,et al.  Sampling in Thermal Simulation of Processors: Measurement, Characterization, and Evaluation , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Sherief Reda,et al.  Spectral techniques for high-resolution thermal characterization with limited sensor data , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[27]  Shahin Nazarian,et al.  Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods , 2006, Proceedings of the IEEE.

[28]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[29]  Tajana Simunic,et al.  Static and Dynamic Temperature-Aware Scheduling for Multiprocessor SoCs , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[30]  Yufu Zhang,et al.  Accurate temperature estimation using noisy thermal sensors , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[31]  Luca Benini,et al.  Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.

[32]  Chen-Yong Cher,et al.  An information-theoretic framework for optimal temperature sensor allocation and full-chip thermal monitoring , 2012, DAC Design Automation Conference 2012.