Distributed task migration for thermal hot spot reduction in many-core microprocessors

In this paper, we propose a new distributed task migration method to reduce the thermal hot spots and on-chip temperature variance, which leads to better thermal reliability and reduced package costs of emerging many-core processors. The novelty of the new algorithm is that the task migration is done in a fully distributed way while we can still maintain some degrees of global view to guide the process. This is enabled by recently proposed distributed state tracking technique to dynamically estimate the average temperature of all the cores, which provides the important global view of the temperature of the whole chip to efficiently guide local task migration among cores. In addition, the local task migration will be carried out based on the power, temperature, and load influence from neighboring cores. Our experimental results on a 36 core microprocessor demonstrate that the proposed method can reduce 30% more thermal hot spots compared with the existing distributed thermal management method, leading to more balanced temperature distribution of many-core microprocessor chips.

[1]  Eun Jung Kim,et al.  Predictive dynamic thermal management for multicore systems , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[2]  Diane Weidmann,et al.  An advanced reliability improvement and failure analysis approach to thermal stress issues in IC packages , 2009, 2009 16th IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits.

[3]  Guanglei Liu,et al.  Neighbor-aware dynamic thermal management for multi-core platform , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[4]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[5]  Yongcan Cao,et al.  Distributed Average Tracking of Multiple Time-Varying Reference Signals With Bounded Derivatives , 2012, IEEE Transactions on Automatic Control.

[6]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[7]  Jörg Henkel,et al.  TAPE: Thermal-aware agent-based power econom multi/many-core architectures , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[8]  Tajana Simunic,et al.  Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[9]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[11]  Yongcan Cao,et al.  Distributed computation of the average of multiple time-varying reference signals , 2011, Proceedings of the 2011 American Control Conference.

[12]  Qinru Qiu,et al.  Distributed task migration for thermal management in many-core systems , 2010, Design Automation Conference.

[13]  T. N. Vijaykumar,et al.  Heat-and-run: leveraging SMT and CMP to manage power density through the operating system , 2004, ASPLOS XI.

[14]  Coniferous softwood GENERAL TERMS , 2003 .

[15]  Seda Ogrenci Memik,et al.  Physical aware frequency selection for dynamic thermal management in multi-core systems , 2006, ICCAD.

[16]  Enrico Macii,et al.  Implementation of a thermal management unit for canceling temperature-dependent clock skew variations , 2008, Integr..