CAnDy-TM: Comparative analysis of dynamic thermal management in many-cores using model checking

Dynamic thermal management (DTM) techniques based on task migration provide a promising solution to mitigate thermal emergencies and thereby ensuring safe operation and reliability of Many-Core systems. These techniques can be classified as central or distributed on the basis of a central DTM controller for the whole system or individual DTM controllers for each core or set of cores in the system, respectively. However, having a trustworthy comparison between central (c-) and distributed (d-) DTM techniques to find out the most suitable one for a given system is quite challenging. This is primarily due to the systemic difference between cDTM and dDTM controllers, and the inherent non-exhaustiveness of simulation and emulation methods conventionally used for DTM analysis. In this paper, we present a novel methodology called CAnDy-TM (stands for Comparative Analysis of Dynamic Thermal Management) that employs Model Checking to perform formal comparative analysis for cDTM and dDTM techniques. We identify a set of generic functional and performance properties to provide a common ground for their comparison. We demonstrate the usability and benefits of our methodology by comparing state-of-the-art cDTM and dDTM techniques, and illustrate which technique is good w.r.t. thermal stability and other task migration parameters. Such an analysis helps in selecting the most appropriate DTM for a given chip.

[1]  Sheldon X.-D. Tan,et al.  Distributed task migration for thermal hot spot reduction in many-core microprocessors , 2013, 2013 IEEE 10th International Conference on ASIC.

[2]  Kang G. Shin,et al.  Predicting thermal behavior for temperature management in time-critical multicore systems , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[3]  Sandeep K. Shukla,et al.  Using probabilistic model checking for dynamic power management , 2005, Formal Aspects of Computing.

[4]  Jörg Henkel,et al.  TAPE: thermal-aware agent-based power economy for multi/many-core architectures , 2009, ICCAD '09.

[5]  Muhammad Shafique,et al.  Formal Verification of Distributed Task Migration for Thermal Management in On-Chip Multi-core Systems Using nuXmv , 2014, FTSCS.

[6]  Muhammad Shafique,et al.  FAMe-TM: Formal analysis methodology for task migration algorithms in Many-Core systems , 2017, Sci. Comput. Program..

[7]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, ISCA 2006.

[8]  Augustus K. Uht,et al.  Central vs. distributed dynamic thermal management for multi-core processors: which one is better? , 2009, GLSVLSI '09.

[9]  Mohammad Abdullah Al Faruque,et al.  Runtime Thermal Management Using Software Agents for Multi- and Many-Core Architectures , 2010, IEEE Design & Test of Computers.

[10]  Kevin Skadron,et al.  Exploring the thermal impact on manycore processor performance , 2010, 2010 26th Annual IEEE Semiconductor Thermal Measurement and Management Symposium (SEMI-THERM).

[11]  Bartosz Wojciechowski,et al.  Fast and accurate thermal simulation and modelling of workloads of many-core processors , 2011, 2011 17th International Workshop on Thermal Investigations of ICs and Systems (THERMINIC).

[12]  Marco Roveri,et al.  The nuXmv Symbolic Model Checker , 2014, CAV.

[13]  Edmund M. Clarke,et al.  Model Checking , 1999, Handbook of Automated Reasoning.

[14]  Siddharth Garg,et al.  EmPower: FPGA based emulation of dynamic power management algorithms for multi-core systems on chip (abstract only) , 2012, FPGA '12.

[15]  Pradip Bose,et al.  Multicore power management: Ensuring robustness via early-stage formal verification , 2009, 2009 7th IEEE/ACM International Conference on Formal Methods and Models for Co-Design.

[16]  Seda Ogrenci Memik,et al.  Physical aware frequency selection for dynamic thermal management in multi-core systems , 2006, ICCAD.

[17]  Muhammad Shafique,et al.  Formal probabilistic analysis of distributed dynamic thermal management , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Heba Khdr,et al.  mDTM: Multi-objective dynamic thermal management for on-chip systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Muhammad Shafique,et al.  Probabilistic Formal Verification Methodology for Decentralized Thermal Management in On-Chip Systems , 2015, 2015 IEEE 24th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises.