On-Line Temperature Estimation for Noisy Thermal Sensors Using a Smoothing Filter-Based Kalman Predictor

Dynamic thermal management (DTM) mechanisms utilize embedded thermal sensors to collect fine-grained temperature information for monitoring the real-time thermal behavior of multi-core processors. However, embedded thermal sensors are very susceptible to a variety of sources of noise, including environmental uncertainty and process variation. This causes the discrepancies between actual temperatures and those observed by on-chip thermal sensors, which seriously affect the efficiency of DTM. In this paper, a smoothing filter-based Kalman prediction technique is proposed to accurately estimate the temperatures from noisy sensor readings. For the multi-sensor estimation scenario, the spatial correlations among different sensor locations are exploited. On this basis, a multi-sensor synergistic calibration algorithm (known as MSSCA) is proposed to improve the simultaneous prediction accuracy of multiple sensors. Moreover, an infrared imaging-based temperature measurement technique is also proposed to capture the thermal traces of an advanced micro devices (AMD) quad-core processor in real time. The acquired real temperature data are used to evaluate our prediction performance. Simulation shows that the proposed synergistic calibration scheme can reduce the root-mean-square error (RMSE) by 1.2 ∘C and increase the signal-to-noise ratio (SNR) by 15.8 dB (with a very small average runtime overhead) compared with assuming the thermal sensor readings to be ideal. Additionally, the average false alarm rate (FAR) of the corrected sensor temperature readings can be reduced by 28.6%. These results clearly demonstrate that if our approach is used to perform temperature estimation, the response mechanisms of DTM can be triggered to adjust the voltages, frequencies, and cooling fan speeds at more appropriate times.

[1]  Christopher Gonzalez,et al.  5.1 POWER8TM: A 12-core server-class processor in 22nm SOI with 7.6Tb/s off-chip bandwidth , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[2]  Jeffrey D. Gilbert,et al.  Over one million TPCC with a 45nm 6-core Xeon® CPU , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[3]  Jinjun Xiong,et al.  Robust Extraction of Spatial Correlation , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  Jose Renau,et al.  Sampling in Thermal Simulation of Processors: Measurement, Characterization, and Evaluation , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[5]  R. Kumar,et al.  An Integrated Quad-Core Opteron Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[6]  Marcelo Yuffe,et al.  The Implementation of the 65nm Dual-Core 64b Merom Processor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[7]  Kang G. Shin,et al.  Predicting thermal behavior for temperature management in time-critical multicore systems , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[8]  Feng Han,et al.  Accurate runtime thermal prediction scheme for 3D NoC systems with noisy thermal sensors , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[9]  Jürgen Teich,et al.  Power Density-Aware Resource Management for Heterogeneous Tiled Multicores , 2017, IEEE Transactions on Computers.

[10]  Yifeng Zhu,et al.  Temporal characterization of SPEC CPU2006 workloads: Analysis and synthesis , 2012, 2012 IEEE 31st International Performance Computing and Communications Conference (IPCCC).

[11]  Sherief Reda,et al.  Blind Identification of Thermal Models and Power Sources From Thermal Measurements , 2018, IEEE Sensors Journal.

[12]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  N. Okumura,et al.  Design of a Multi-Core SoC with Configurable Heterogeneous 9 CPUs and 2 Matrix Processors , 2007, 2007 IEEE Symposium on VLSI Circuits.

[14]  Sherief Reda,et al.  Power mapping and modeling of multi-core processors , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[15]  Tajana Simunic,et al.  Accurate Temperature Estimation for Efficient Thermal Management , 2008, 9th International Symposium on Quality Electronic Design (isqed 2008).

[16]  Muhammad Shafique,et al.  Hayat: Harnessing Dark Silicon and variability for aging deceleration and balancing , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Russell Tessier,et al.  Dynamic On-Chip Thermal Sensor Calibration Using Performance Counters , 2014, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Sandip Kundu,et al.  On process variation tolerant low cost thermal sensor design in 32nm CMOS technology , 2009, GLSVLSI '09.

[19]  J. B. Rosinha,et al.  A new kernel Kalman filter algorithm for estimating time-varying nonlinear systems , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[20]  Costas J. Spanos,et al.  Modeling within-die spatial correlation effects for process-design co-optimization , 2005, Sixth international symposium on quality electronic design (isqed'05).

[21]  Sujit Dey,et al.  Joint Work and Voltage/Frequency Scaling for Quality-Optimized Dynamic Thermal Management , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Muhammad Shafique,et al.  Variability-aware dark silicon management in on-chip many-core systems , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Soraya Ghiasi,et al.  System power management support in the IBM POWER6 microprocessor , 2007, IBM J. Res. Dev..

[24]  S. Naffziger,et al.  Power and temperature control on a 90-nm Itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[25]  Luca Benini,et al.  An Effective Gray-Box Identification Procedure for Multicore Thermal Modeling , 2014, IEEE Transactions on Computers.

[26]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[27]  Heba Khdr,et al.  Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  G. A. Einicke,et al.  Smoothing, Filtering and Prediction - Estimating The Past, Present and Future , 2012 .

[29]  Kun Wang,et al.  Kalman Predictor-Based Proactive Dynamic Thermal Management for 3-D NoC Systems With Noisy Thermal Sensors , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Sherief Reda,et al.  Within-die process variations: How accurately can they be statistically modeled? , 2008, 2008 Asia and South Pacific Design Automation Conference.

[31]  Xin Li,et al.  Optimising thermal sensor placement and thermal maps reconstruction for microprocessors using simulated annealing algorithm based on PCA , 2016, IET Circuits Devices Syst..

[32]  Sherief Reda,et al.  Thermal monitoring of real processors: Techniques for sensor allocation and full characterization , 2010, Design Automation Conference.

[33]  Bishop Brock,et al.  Introducing the Adaptive Energy Management Features of the Power7 Chip , 2011, IEEE Micro.

[34]  Tajana Simunic,et al.  Accurate Direct and Indirect On-Chip Temperature Sensing for Efficient Dynamic Thermal Management , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35]  Saurabh Dighe,et al.  A 45nm 48-core IA processor with variation-aware scheduling and optimal core mapping , 2011, 2011 Symposium on VLSI Circuits - Digest of Technical Papers.

[36]  Hannu Tenhunen,et al.  Software-based on-chip thermal sensor calibration for DVFS-enabled many-core systems , 2015, 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS).

[37]  Yufu Zhang,et al.  Accurate Temperature Estimation Using Noisy Thermal Sensors for Gaussian and Non-Gaussian Cases , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[38]  Hidetoshi Onodera,et al.  Wide-Supply-Range All-Digital Leakage Variation Sensor for On-Chip Process and Temperature Monitoring , 2015, IEEE Journal of Solid-State Circuits.

[39]  Sherief Reda,et al.  Improved Thermal Tracking for Processors Using Hard and Soft Sensor Allocation Techniques , 2011, IEEE Transactions on Computers.

[40]  W. Burleson,et al.  Low-power and robust on-chip thermal sensing using differential ring oscillators , 2007, 2007 50th Midwest Symposium on Circuits and Systems.

[41]  Charles-Alexis Lefebvre,et al.  Digital thermal sensor based on ring-oscillators in Zynq SoC technology , 2016, 2016 22nd International Workshop on Thermal Investigations of ICs and Systems (THERMINIC).

[42]  Yufu Zhang,et al.  Dynamic Thermal Management Under Soft Thermal Constraints , 2013, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[43]  Ramy E. Aly,et al.  A Family of 32 nm IA Processors , 2011, IEEE Journal of Solid-State Circuits.

[44]  K. Nose,et al.  A 1.1V 35μm × 35μm thermal sensor with supply voltage sensitivity of 2°C/10%-supply for thermal management on the SX-9 supercomputer , 2008, 2008 IEEE Symposium on VLSI Circuits.

[45]  Greg Taylor,et al.  Temperature Sensor Design in a High Volume Manufacturing 65nm CMOS Digital Process , 2007, 2007 IEEE Custom Integrated Circuits Conference.

[46]  Muhammad Shafique,et al.  MatEx: Efficient transient and peak temperature computation for compact thermal models , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[47]  An-Yeu Wu,et al.  RC-Based Temperature Prediction Scheme for Proactive Dynamic Thermal Management in Throttle-Based 3D NoCs , 2015, IEEE Transactions on Parallel and Distributed Systems.

[48]  Jose Renau,et al.  Cooling solutions for processor Infrared Thermography , 2010, 2010 26th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).