High-Resolution Thermal Maps Extraction of Multi-Core Processors Based on Convolutional Neural Networks

Thermal issues are a major concern in high-end computing systems as they severely constrain the performance and shorten the lifetime of integrated circuits. Using embedded on-die thermal sensors, state-of-the-art processors widely employ dynamic thermal management (DTM) mechanisms to prevent thermal runaway situations in multi-core architectures. Full thermal characterization is particularly useful for fine-grained thermal management techniques, examples of which include per-core workload scheduling, and voltage and frequency scaling. In this paper, a new direction for full thermal reconstruction of multi-core processors is proposed based on convolutional neural networks (CNNs) to precisely recover the overall thermal maps from a small number of sensors. The effectiveness of our method is verified on a real AMD quad-core processor. Experimental results indicate that the proposed method is capable of handling high-resolution thermal extraction, and achieves significant improvements compared to available techniques in the literature. The main contribution of this work is to demonstrate the ability of deep learning approaches for full thermal reconstruction of semiconductor chips. The success of our proposed method will assist DTM to achieve a more accurate thermal monitoring.

[1]  Xin Li,et al.  Synergistic Calibration of Noisy Thermal Sensors Using Smoothing Filter-Based Kalman Predictor , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[2]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sujit Dey,et al.  Joint Work and Voltage/Frequency Scaling for Quality-Optimized Dynamic Thermal Management , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Tao Liu,et al.  Surface spline interpolation method for thermal reconstruction with limited sensor data of non-uniform placements , 2014 .

[5]  Sherief Reda,et al.  Thermal monitoring of real processors: Techniques for sensor allocation and full characterization , 2010, Design Automation Conference.

[6]  Sherief Reda,et al.  Improved Thermal Tracking for Processors Using Hard and Soft Sensor Allocation Techniques , 2011, IEEE Transactions on Computers.

[7]  Yifeng Zhu,et al.  Temporal characterization of SPEC CPU2006 workloads: Analysis and synthesis , 2012, 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC).

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Peter R. Kinget,et al.  Compact and Supply-Voltage-Scalable Temperature Sensors for Dense On-Chip Thermal Monitoring , 2015, IEEE Journal of Solid-State Circuits.

[10]  Sherief Reda,et al.  Spectral techniques for high-resolution thermal characterization with limited sensor data , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[11]  Shahin Nazarian,et al.  Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods , 2006, Proceedings of the IEEE.

[12]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[13]  Xin Li,et al.  Reducing the number of sensors under hot spot temperature error bound for microprocessors based on dual clustering , 2013, IET Circuits Devices Syst..

[14]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[15]  Sherief Reda,et al.  Power mapping and modeling of multi-core processors , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[16]  Jeremy Bottleson,et al.  clCaffe: OpenCL Accelerated Caffe for Convolutional Neural Networks , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[17]  David Atienza,et al.  EigenMaps: Algorithms for optimal thermal maps extraction and sensor placement on multicore processors , 2012, DAC Design Automation Conference 2012.

[18]  Sudhakar Yalamanchili,et al.  Architectural Reliability: Lifetime Reliability Characterization and Management ofMany-Core Processors , 2015, IEEE Computer Architecture Letters.

[19]  Prasanta K. Jana,et al.  An effective Multi-Objective task scheduling algorithm using Min-Max normalization in cloud computing , 2016, 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT).