Improved Thermal Tracking for Processors Using Hard and Soft Sensor Allocation Techniques

Hot spots are a major concern in high-end processors as they constrain performance and limit the lifetime of semiconductor chips. Using embedded thermal sensors, dynamic thermal management systems track the hot spots during runtime and adjust the performance and the cooling system of the processor when necessary. In many-core processors, the locations of hot spots vary spatially and temporally depending on the configuration of active cores and the workloads running on the cores. Our work includes both theoretical advances in sensor allocation techniques and experimental advances for thermal imaging of real processors. We propose a hard sensor allocation algorithm to determine the sensor locations where hot spots can be tracked accurately given a budget number of sensors. We also propose soft sensor computation techniques to alleviate design constraints on sensor locations and to further improve the resolution of hot spot tracking. The proposed soft sensing technique combines the measurements of the hard sensors in an optimal way to estimate the temperature at any desired location. We use infrared imaging methods to characterize the thermal behavior of a real dual-core processor during operation. We execute large number of workload configurations on the processor and track the locations and temperatures of hot spots during runtime. The thermal characterization data are then used as the input to our sensor allocation techniques. We demonstrate that our sensor allocation techniques improve significantly upon the previous results in the literature and provide accurate tracking of hot spots.

[1]  Kevin Skadron,et al.  Differentiating the roles of IR measurement and simulation for power and temperature-aware design , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[2]  Seda Ogrenci Memik,et al.  Optimizing Thermal Sensor Allocation for Microprocessors , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Tajana Simunic,et al.  Proactive temperature management in MPSoCs , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[4]  Malik Magdon-Ismail,et al.  On selecting a maximum volume sub-matrix of a matrix and related problems , 2009, Theor. Comput. Sci..

[5]  T. N. Vijaykumar,et al.  Heat-and-run: leveraging SMT and CMP to manage power density through the operating system , 2004, ASPLOS XI.

[6]  Kevin Skadron,et al.  Hybrid architectural dynamic thermal management , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[7]  Stephen P. Boyd,et al.  Temperature-aware processor frequency assignment for MPSoCs using convex optimization , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[8]  Omer Khan,et al.  A framework for predictive dynamic temperature management of microprocessor systems , 2008, ICCAD 2008.

[9]  Kevin Skadron,et al.  Many-core design from a thermal perspective , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[10]  Krste Asanovic,et al.  Reducing power density through activity migration , 2003, ISLPED '03.

[11]  Gilbert Strang,et al.  Computational Science and Engineering , 2007 .

[12]  S. Naffziger,et al.  Power and temperature control on a 90-nm Itanium family processor , 2006, IEEE Journal of Solid-State Circuits.

[13]  Sherief Reda,et al.  Thermal monitoring of real processors: Techniques for sensor allocation and full characterization , 2010, Design Automation Conference.

[14]  Sherief Reda,et al.  Spectral techniques for high-resolution thermal characterization with limited sensor data , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  Shahin Nazarian,et al.  Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods , 2006, Proceedings of the IEEE.

[16]  Tajana Simunic,et al.  Temperature-aware MPSoC scheduling for reducing hot spots and gradients , 2008, 2008 Asia and South Pacific Design Automation Conference.

[17]  Li Shang,et al.  Power, Thermal, and Reliability Modeling in Nanometer-Scale Microprocessors , 2007, IEEE Micro.

[18]  Taewhan Kim,et al.  Thermal sensor allocation and placement for reconfigurable systems , 2009, TODE.

[19]  Jonathan A. Winter,et al.  Addressing thermal nonuniformity in SMT workloads , 2008, TACO.

[20]  Li Shang,et al.  Three-Dimensional Chip-Multiprocessor Run-Time Thermal Management , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Yufu Zhang,et al.  On-chip sensor-driven efficient thermal profile estimation algorithms , 2010, TODE.

[22]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, ISCA 2006.

[23]  E. Cohen,et al.  Hotspot-Limited Microprocessors: Direct Temperature and Power Distribution Measurements , 2007, IEEE Journal of Solid-State Circuits.

[24]  Coniferous softwood GENERAL TERMS , 2003 .

[25]  Marek Chrobak,et al.  Dynamic Thermal Management through Task Scheduling , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.

[26]  Karthick Rajamani,et al.  Thermal response to DVFS: analysis with an Intel Pentium M , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[27]  Massoud Pedram,et al.  Stochastic modeling of a thermally-managed multi-core system , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[28]  Kevin Skadron,et al.  Using performance counters for runtime temperature sensing in high-performance processors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[29]  Seda Ogrenci Memik,et al.  Thermal monitoring mechanisms for chip multiprocessors , 2008, TACO.

[30]  Li Shang,et al.  HybDTM: a coordinated hardware-software approach for dynamic thermal management , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[31]  Sheng-Chih Lin,et al.  Cool Chips: Opportunities and Implications for Power and Thermal Management , 2008, IEEE Transactions on Electron Devices.

[32]  Margaret Martonosi,et al.  Dynamic thermal management for high-performance microprocessors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[33]  Karam S. Chatha,et al.  Approximation algorithm for the temperature-aware scheduling problem , 2007, 2007 IEEE/ACM International Conference on Computer-Aided Design.

[34]  Shekhar Borkar Thousand Core ChipsA Technology Perspective , 2007, DAC 2007.

[35]  Massoud Pedram,et al.  A stochastic local hot spot alerting technique , 2008, 2008 Asia and South Pacific Design Automation Conference.

[36]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[37]  Sherief Reda,et al.  Frequency and voltage planning for multi-core processors under thermal constraints , 2008, 2008 IEEE International Conference on Computer Design.

[38]  Seda Ogrenci Memik,et al.  Physical aware frequency selection for dynamic thermal management in multi-core systems , 2006, ICCAD.

[39]  Jose Renau,et al.  Characterizing processor thermal behavior , 2010, ASPLOS XV.

[40]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[41]  Kevin Skadron,et al.  Analytical model for sensor placement on microprocessors , 2005, 2005 International Conference on Computer Design.

[42]  Jose Renau,et al.  Measuring performance, power, and temperature from real processors , 2007, ExpCS '07.

[43]  Chih-Cheng Hsieh,et al.  Focal-plane-arrays and CMOS readout techniques of infrared imaging systems , 1997, IEEE Trans. Circuits Syst. Video Technol..

[44]  E. Rotem,et al.  Temperature measurement in the Intel(R) CoreTM Duo Processor , 2006 .

[45]  Kevin Skadron,et al.  Compact thermal modeling for temperature-aware design , 2004, Proceedings. 41st Design Automation Conference, 2004..

[46]  Seda Ogrenci Memik,et al.  Systematic temperature sensor allocation and placement for microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.