Techniques for overheating detection and sensor allocation in a real dual-core processor

Current processor families widely deploy on-chip thermal sensors to continuously monitor the real-time thermal behavior. However, on-chip thermal sensors are inevitably accompanied by a variety of noise sources such as fabrication randomness and environmental uncertainty, which directly impact the reliability of dynamic thermal management (DTM). In this paper, the problems of sensor allocation for overheating detection are formulated as constrained optimization problems, when the sensor observations have been corrupted by noise. Moreover, a lightweight sensor allocation scheme (called LSAS) based on the custom-built genetic algorithm is proposed to significantly improve the overheating detection performance with an approximate linear execution time. Based on the LSAS and greedy optimization techniques, a hybrid algorithm for local overheating detection is also proposed to identify the optimal sensor allocation for each individual processor block. Meanwhile, an infrared temperature measurement setup is developed to capture the thermal traces of a 45 nm dual-core AMD Athlon X2 5000 processor. The obtained realistic temperature data are used to verify the performance. Experimental results show that the LSAS can achieve the overheating detection probability by up to 0.93 with an overhead of ten sensors.

[1]  Jose Renau,et al.  Cooling solutions for processor Infrared Thermography , 2010, 2010 26th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).

[2]  An-Yeu Wu,et al.  RC-Based Temperature Prediction Scheme for Proactive Dynamic Thermal Management in Throttle-Based 3D NoCs , 2015, IEEE Transactions on Parallel and Distributed Systems.

[3]  Yufu Zhang,et al.  Accurate Temperature Estimation Using Noisy Thermal Sensors for Gaussian and Non-Gaussian Cases , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Yixin Chen,et al.  Intelligent Sensor Placement for Hot Server Detection in Data Centers , 2013, IEEE Transactions on Parallel and Distributed Systems.

[5]  Seda Ogrenci Memik,et al.  Optimizing Thermal Sensor Allocation for Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[6]  Xin Li,et al.  Optimising thermal sensor placement and thermal maps reconstruction for microprocessors using simulated annealing algorithm based on PCA , 2016, IET Circuits Devices Syst..

[7]  Sujit Dey,et al.  Joint Work and Voltage/Frequency Scaling for Quality-Optimized Dynamic Thermal Management , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Hannu Tenhunen,et al.  Software-based on-chip thermal sensor calibration for DVFS-enabled many-core systems , 2015, 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[9]  Yuping Wang,et al.  A Hybrid Genetic Algorithm for the Minimum Exposure Path Problem of Wireless Sensor Networks Based on a Numerical Functional Extreme Model , 2016, IEEE Transactions on Vehicular Technology.

[10]  Jose Renau,et al.  Sampling in Thermal Simulation of Processors: Measurement, Characterization, and Evaluation , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Kok Kiong Tan,et al.  Development of a Genetic-Algorithm-Based Nonlinear Model Predictive Control Scheme on Velocity and Steering of Autonomous Vehicles , 2016, IEEE Transactions on Industrial Electronics.

[12]  J. Shor,et al.  Evolution of thermal sensors in Intel processors from 90nm to 22nm , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[13]  Sherief Reda,et al.  Improved Thermal Tracking for Processors Using Hard and Soft Sensor Allocation Techniques , 2011, IEEE Transactions on Computers.

[14]  Xin Li,et al.  Reducing the number of sensors under hot spot temperature error bound for microprocessors based on dual clustering , 2013, IET Circuits Devices Syst..

[15]  Russell Tessier,et al.  Dynamic On-Chip Thermal Sensor Calibration Using Performance Counters , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.