An SMDP-Based Approach to Thermal-Aware Task Scheduling in NoC-based MPSoC platforms

One efficient approach to control chip-wide thermal distribution in multi-core systems is the optimization of online assignments of tasks to processing cores. Online task assignment, however, faces several uncertainties in real-world Systems and does not show a deterministic nature. In this paper, we consider the operation of a thermal-aware task scheduler, dispatching tasks from an arrival queue as well as setting the voltage and frequency of the processing cores to optimize the mean temperature margin of the entire chip (i.e., cores as well as the NoC routers). We model the decision process of the task scheduler as a semi-Markov decision problem (SMDP). Then, to solve the formulated SMDP, we propose two reinforcement learning algorithms that are capable of computing the optimal task assignment policy without requiring the statistical knowledge of the stochastic dynamics underlying the system states. The proposed algorithms also rely on function approximation techniques to handle the infinite length of the task queue as well as the continuous nature of temperature readings. Compared to related research, the simulation results show a nearly 6 Kelvin reduction in system average peak temperature and 66 milliseconds decrease in mean task service time.

[1]  Chen Sun,et al.  DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[2]  Massoud Pedram,et al.  TAPP: Temperature-aware application mapping for NoC-based many-core processors , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Ying Tan,et al.  Achieving autonomous power management using reinforcement learning , 2013, TODE.

[4]  Hai Zhou,et al.  Multi-objective Task Mapping Approach for Wireless NoC in Dark Silicon Age , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[5]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Qinru Qiu,et al.  Dynamic thermal management for multimedia applications using machine learning , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Massoud Pedram,et al.  Stochastic modeling of a thermally-managed multi-core system , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[9]  Axel Jantsch,et al.  Designing 2D and 3D Network-on-Chip Architectures , 2013 .

[10]  Diana Marculescu,et al.  Distributed reinforcement learning for power limited many-core system performance optimization , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Amirali Shayan Arani Online thermal-aware scheduling for multiple clock domain CMPs , 2007, 2007 IEEE International SOC Conference.

[13]  Dirk Timmermann,et al.  Modeling temperature distribution in Networks-on-Chip using RC-circuits , 2010, 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[14]  Tao Liu,et al.  Dynamic thermal management by greedy scheduling algorithm , 2012 .

[15]  Bharadwaj Veeravalli,et al.  Reinforcement learning-based inter- and intra-application thermal optimization for lifetime improvement of multicore systems , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[16]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[17]  Geoff V. Merrett,et al.  Learning-Based Run-Time Power and Energy Management of Multi/Many-Core Systems: Current and Future Trends , 2017, J. Low Power Electron..

[18]  Laxmi N. Bhuyan,et al.  Thermal prediction and scheduling of network applications on multicore processors , 2013, Architectures for Networking and Communications Systems.

[19]  Kevin Skadron,et al.  Temperature-aware microarchitecture: Modeling and implementation , 2004, TACO.

[20]  Ali Movaghar-Rahimabadi,et al.  Analytical Leakage-Aware Thermal Modeling of a Real-Time System , 2014, IEEE Transactions on Computers.

[21]  Geoff V. Merrett,et al.  Adaptive and Hierarchical Runtime Manager for Energy-Aware Thermal Management of Embedded Systems , 2016, ACM Trans. Embed. Comput. Syst..

[22]  Tajana Simunic,et al.  Temperature Aware Task Scheduling in MPSoCs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[23]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[24]  Daniele D. Giusto,et al.  A spline-like scheme for least-squares bilinear interpolations of images , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[27]  Zoran A. Salcic,et al.  Temperature-aware task scheduling heuristics on Network-on-Chips , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[28]  Sobia Baig,et al.  An online temperature-aware scheduling technique to avoid thermal emergencies in multiprocessor systems , 2018, Comput. Electr. Eng..

[29]  Weiping Jing,et al.  Energy and thermal aware mapping for mesh-based NoC architectures using multi-objective ant colony algorithm , 2011, 2011 3rd International Conference on Computer Research and Development.

[30]  Abhijit Gosavi,et al.  Relative value iteration for average reward semi-Markov control via simulation , 2013, 2013 Winter Simulations Conference (WSC).

[31]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[32]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[33]  Margaret Martonosi,et al.  Techniques for Multicore Thermal Management: Classification and New Exploration , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[34]  Shiting Lu,et al.  On thermal sensor calibration and software techniques for many-core thermal management , 2015 .

[35]  Manuel Prieto,et al.  Survey of Energy-Cognizant Scheduling Techniques , 2013, IEEE Transactions on Parallel and Distributed Systems.

[36]  Alborz Geramifard,et al.  A Tutorial on Linear Function Approximators for Dynamic Programming and Reinforcement Learning , 2013, Found. Trends Mach. Learn..

[37]  Santanu Chattopadhyay,et al.  Thermal variance-aware application mapping for mesh based network-on-chip design using Kernighan-Lin partitioning , 2014, 2014 International Conference on Parallel, Distributed and Grid Computing.

[38]  Diana Marculescu,et al.  Analysis of dynamic voltage/frequency scaling in chip-multiprocessors , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[39]  Alireza Ejlali,et al.  Peak-Power-Aware Energy Management for Periodic Real-Time Applications , 2020, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[40]  Stanley P. Y. Fung,et al.  Temperature Aware Online Scheduling with a Low Cooling Factor , 2010, TAMC.

[41]  Lieven Eeckhout,et al.  Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[42]  Russell Tessier,et al.  Reinforcement Learning for Thermal-aware Many-core Task Allocation , 2015, ACM Great Lakes Symposium on VLSI.

[43]  Daniel Cole,et al.  Online Algorithms for Maximizing Weighted Throughput of Unit Jobs with Temperature Constraints , 2011, FAW-AAIM.

[44]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[45]  Zhe Chen,et al.  A Thermal Balance Oriented Task Mapping for CMPs , 2018, ICICM '18.

[46]  Kyriakos Stavrou,et al.  Thermal-Aware Scheduling for Future Chip Multiprocessors , 2007, EURASIP J. Embed. Syst..