Leakage-Aware Predictive Thermal Management for Multicore Systems Using Echo State Network

Leakage power is becoming significant in new generation IC chips. As leakage power is nonlinearly related to temperature, it is challenging to manage the thermal behavior of today’s multicore systems, since thermal management becomes a nonlinear control problem. In this paper, a new predictive dynamic thermal management (DTM) method with neural network thermal model is proposed to naturally consider the inherent nonlinearity between leakage and temperature. We start with analyzing the problems of using recurrent neural network (RNN) to build the nonlinear thermal model, and point out that there is exploding gradient induced long-term dependencies problem, leading to large model prediction errors. Based on this analysis, we further propose to use echo state network (ESN), which is a special type of RNN, as the leakage-aware nonlinear thermal model. We theoretically and experimentally show that ESN achieves much higher accuracy by completely avoiding the long-term dependencies problem. On top of this nonlinear ESN thermal model, we propose a novel model predictive control (MPC) scheme called ESN MPC, which uses iterative steps to find the optimal future power recommendations for thermal management. Being able to consider the leakage-temperature nonlinear effects and equipped with advanced control technique, the new method achieves an overall high quality temperature management with smooth and accurate temperature tracking. The experimental results show the new method outperforms the state-of-the-art leakage-aware multicore DTM method in both temperature management quality and computing overhead.

[1]  Luca Benini,et al.  Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster , 2017, IEEE Micro.

[2]  Sarma B. K. Vrudhula,et al.  Performance Optimal Online DVFS and Task Migration Techniques for Thermally Constrained Multi-Core Processors , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[4]  Mircea R. Stan,et al.  System level leakage reduction considering the interdependence of temperature and leakage , 2004, Proceedings. 41st Design Automation Conference, 2004..

[5]  Luca Benini,et al.  Bias-Compensated Least Squares Identification of Distributed Thermal Models for Many-Core Systems-on-Chip , 2014, IEEE Transactions on Circuits and Systems I: Regular Papers.

[6]  Tulika Mitra,et al.  A hybrid local-global approach for multi-core thermal management , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[7]  Ming Zhang,et al.  GDP: A Greedy Based Dynamic Power Budgeting Method for Multi/Many-Core Systems in Dark Silicon , 2019, IEEE Transactions on Computers.

[8]  Muhammad Shafique,et al.  TONE: Adaptive temperature optimization for the next generation video encoders , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[9]  Massoud Pedram,et al.  Performance Comparisons Between 7-nm FinFET and Conventional Bulk CMOS Standard Cell Libraries , 2015, IEEE Transactions on Circuits and Systems II: Express Briefs.

[10]  Muhammad Shafique,et al.  Thermal optimization using adaptive approximate computing for video coding , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Andrea Bartolini,et al.  Self-Aware Thermal Management for High-Performance Computing Processors , 2018, IEEE Design & Test.

[12]  Stephen P. Boyd,et al.  Processor Speed Control With Thermal Constraints , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[13]  Yan Zhang,et al.  Leakage Aware Feasibility Analysis for Temperature-Constrained Hard Real-Time Periodic Tasks , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[14]  Jian Ma,et al.  Thermal modeling for energy-efficient smart building with advanced overfitting mitigation technique , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[15]  Sarma B. K. Vrudhula,et al.  Performance optimal processor throttling under thermal constraints , 2007, CASES '07.

[16]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[17]  Ankur Srivastava,et al.  Dynamic Thermal Management Considering Accurate Temperature-Leakage Interdependency , 2010 .

[18]  Yehea I. Ismail,et al.  Thermal Management of On-Chip Caches Through Power Density Minimization , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Herbert Jaeger,et al.  Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[20]  Heba Khdr,et al.  Thermal Safe Power (TSP): Efficient Power Budgeting for Heterogeneous Manycore Systems in Dark Silicon , 2017, IEEE Transactions on Computers.

[21]  Jun Wang,et al.  Model Predictive Control of Unknown Nonlinear Dynamical Systems Based on Recurrent Neural Networks , 2012, IEEE Transactions on Industrial Electronics.

[22]  Vanchinathan Venkataramani,et al.  Hierarchical power management for asymmetric multi-core in dark silicon era , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[23]  Jian Ma,et al.  Hierarchical Dynamic Thermal Management Method for High-Performance Many-Core Microprocessors , 2016, ACM Trans. Design Autom. Electr. Syst..

[24]  Sheldon X.-D. Tan,et al.  Task Migrations for Distributed Thermal Management Considering Transient Effects , 2015, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Gokhan Memik,et al.  Machine Learning-Based Temperature Prediction for Runtime Thermal Management Across System Components , 2018, IEEE Transactions on Parallel and Distributed Systems.

[26]  Ali Movaghar-Rahimabadi,et al.  Analytical Leakage-Aware Thermal Modeling of a Real-Time System , 2014, IEEE Transactions on Computers.

[27]  Tajana Simunic,et al.  Temperature Aware Task Scheduling in MPSoCs , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[28]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[29]  Herbert Jaeger,et al.  Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .

[30]  Muhammad Shafique,et al.  The EDA challenges in the dark silicon era , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[31]  Kevin Skadron,et al.  HotSpot: a compact thermal modeling methodology for early-stage VLSI design , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[32]  Sheldon X.-D. Tan,et al.  Composable thermal modeling and simulation for architecture-level thermal designs of multicore microprocessors , 2013, TODE.

[33]  Norbert Wehn,et al.  Reliable on-chip systems in the nano-era: Lessons learnt and future trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[34]  Geoff V. Merrett,et al.  Learning-Based Run-Time Power and Energy Management of Multi/Many-Core Systems: Current and Future Trends , 2017, J. Low Power Electron..

[35]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[36]  Qinru Qiu,et al.  Distributed task migration for thermal management in many-core systems , 2010, Design Automation Conference.

[37]  Sheldon X.-D. Tan,et al.  A Fast Leakage-Aware Full-Chip Transient Thermal Estimation Method , 2018, IEEE Transactions on Computers.

[38]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[39]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[40]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[41]  Gregory A. Koenig,et al.  Rate-based thermal, power, and co-location aware resource management for heterogeneous data centers , 2018, J. Parallel Distributed Comput..

[42]  Sheldon X.-D. Tan,et al.  Runtime power estimator calibration for high-performance microprocessors , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[43]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[44]  Jinjun Xiong,et al.  Fast Statistical Full-Chip Leakage Analysis for Nanometer VLSI Systems , 2012, TODE.

[45]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[46]  Heba Khdr,et al.  Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[47]  Massoud Pedram,et al.  Leakage current reduction in CMOS VLSI circuits by input vector control , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[48]  Gabriele M. T. D'Eleuterio,et al.  Synthesis of recurrent neural networks for dynamical system simulation , 2015, Neural Networks.

[49]  Muhammad Shafique,et al.  Power and thermal management in massive multicore chips: Theoretical foundation meets architectural innovation and resource allocation , 2016, 2016 International Conference on Compliers, Architectures, and Sythesis of Embedded Systems (CASES).

[50]  David Atienza,et al.  Machine Learning-Based Quality-Aware Power and Thermal Management of Multistream HEVC Encoding on Multicore Servers , 2018, IEEE Transactions on Parallel and Distributed Systems.

[51]  Li Shang,et al.  Accurate Temperature-Dependent Integrated Circuit Leakage Power Estimation is Easy , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[52]  Mahesh Panchal,et al.  Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network , 2014 .

[53]  Sherief Reda,et al.  Blind Identification of Thermal Models and Power Sources From Thermal Measurements , 2018, IEEE Sensors Journal.

[54]  Smruti R. Sarangi,et al.  LightSim: A leakage aware ultrafast temperature simulator , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).

[55]  Yuan Yuan,et al.  STREAM: Stress and Thermal Aware Reliability Management for 3-D ICs , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[56]  Wei Wu,et al.  A systematic method for functional unit power estimation in microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[57]  Luca Benini,et al.  A 60 GOPS/W, −1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology , 2016 .

[58]  Xiaobo Sharon Hu,et al.  Temperature-Aware Scheduling and Assignment for Hard Real-Time Applications on MPSoCs , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[59]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[60]  Giovanni De Micheli,et al.  Multicore thermal management with model predictive control , 2009, 2009 European Conference on Circuit Theory and Design.

[61]  Sarma B. K. Vrudhula,et al.  Energy-Efficient Operation of Multicore Processors by DVFS, Task Migration, and Active Cooling , 2014, IEEE Transactions on Computers.

[62]  Smruti R. Sarangi,et al.  A fast leakage aware thermal simulator for 3D chips , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[63]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[64]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[65]  Shuxiang Xu,et al.  A novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining , 2008 .

[66]  Muhammad Shafique,et al.  Improving mobile gaming performance through cooperative CPU-GPU thermal management , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[67]  Liuping Wang,et al.  Model Predictive Control System Design and Implementation Using MATLAB , 2009 .

[68]  Kai Ma,et al.  Temperature-constrained power control for chip multiprocessors with online model estimation , 2009, ISCA '09.

[69]  Luca Benini,et al.  Thermal model identification of supercomputing nodes in production environment , 2016, IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society.

[70]  Heba Khdr,et al.  mDTM: Multi-objective dynamic thermal management for on-chip systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[71]  Huazhong Yang,et al.  Accurate temperature-dependent integrated circuit leakage power estimation is easy , 2007 .

[72]  Jason Cong,et al.  Energy-efficient scheduling on heterogeneous multi-core architectures , 2012, ISLPED '12.

[73]  Shrirang M. Yardi,et al.  CAMP: A technique to estimate per-structure power at run-time using a few simple parameters , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[74]  Guanglei Liu,et al.  Neighbor-aware dynamic thermal management for multi-core platform , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[75]  Luca Benini,et al.  Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller , 2013, IEEE Transactions on Parallel and Distributed Systems.

[76]  Gang Quan,et al.  Leakage Aware Scheduling on Maximum Temperature Minimization for Periodic Hard Real-Time Systems , 2010, CIT.