Limiting temperature gradients on many-cores by adaptive reallocation of real-time workloads

The advent of many-core processors came with the increase in computational power needed for future applications. However new challenges arrived at the same time, especially for the real-time community. Each core on such a processor is a heat source and uneven usage can lead to hot spots on the processor, affecting its lifetime and reliability. For real-time systems, it is therefore of paramount importance to keep the temperature differences between the individual cores below critical values, in order to prevent premature failure of the system. We argue that this problem can not be solved by traditional approaches, since the growing number of cores makes them intractable. We rather argue to split the problem in the spacial domain and control the temperature on core level. The cores control their temperature by rearranging the load in a predictable manner during runtime. To achieve this, a feedback controller is implemented on each core. We conclude our work with a simulation based evaluation of the proposed approach comparing its performance against a previously presented algorithm.

[1]  Jose Renau,et al.  Characterizing processor thermal behavior , 2010, ASPLOS XV.

[2]  Jianyang Zhou,et al.  Research and analysis of routing algorithms for NoC , 2011, 2011 3rd International Conference on Computer Research and Development.

[3]  John Paul Shen,et al.  Best of both latency and throughput , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[4]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[5]  Eun Jung Kim,et al.  Predictive dynamic thermal management for multicore systems , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[6]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[7]  Sung Woo Chung,et al.  Load Unbalancing Strategy for Multicore Embedded Processors , 2010, IEEE Transactions on Computers.

[8]  Thomas Nolte,et al.  Dynamic power management for thermal control of many-core real-time systems , 2014, SIGBED.

[9]  Guanglei Liu,et al.  Neighbor-aware dynamic thermal management for multi-core platform , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Qinru Qiu,et al.  Distributed task migration for thermal management in many-core systems , 2010, Design Automation Conference.

[11]  R. Viswanath Thermal Performance Challenges from Silicon to Systems , 2000 .

[12]  Sarma B. K. Vrudhula,et al.  Performance Optimal Online DVFS and Task Migration Techniques for Thermally Constrained Multi-Core Processors , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Alan Burns,et al.  Real-Time Communication Analysis for On-Chip Networks with Wormhole Switching , 2008, Second ACM/IEEE International Symposium on Networks-on-Chip (nocs 2008).

[14]  Yajun Ha,et al.  Thermal-aware frequency scaling for adaptive workloads on heterogeneous MPSoCs , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Kang G. Shin,et al.  Predicting thermal behavior for temperature management in time-critical multicore systems , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[16]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[17]  M. Hoagland,et al.  Feedback Systems An Introduction for Scientists and Engineers SECOND EDITION , 2015 .

[18]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[19]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[20]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[21]  Kevin Skadron,et al.  Exploring the thermal impact on manycore processor performance , 2010, 2010 26th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).

[22]  Pradip Bose,et al.  The case for lifetime reliability-aware microprocessors , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[23]  A. Abdel-azim Fundamentals of Heat and Mass Transfer , 2011 .

[24]  Lothar Thiele,et al.  Thermal-Aware Global Real-Time Scheduling on Multicore Systems , 2009, 2009 15th IEEE Real-Time and Embedded Technology and Applications Symposium.

[25]  Stefan M. Petters,et al.  Identifying the sources of unpredictability in COTS-based multicore systems , 2013, 2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES).

[26]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[27]  Qi Yang,et al.  Energy-aware partitioning for multiprocessor real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[28]  Xiaobo Sharon Hu,et al.  Temperature-Aware Scheduling and Assignment for Hard Real-Time Applications on MPSoCs , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[29]  Alan Burns,et al.  A survey of hard real-time scheduling for multiprocessor systems , 2011, CSUR.