Thermal-aware adaptive VM allocation considering server locations in heterogeneous data centers

Abstract Virtualized data centers usually consist of heterogeneous servers which have different specifications (performance). Though there usually exist unused heterogeneous servers in such data centers, conventional DVFS (Dynamic Voltage and Frequency Scaling)-based DTM (Dynamic Thermal Management) techniques do not exploit the unused servers to cool down hot servers. In this paper, we propose a novel DTM technique which adaptively exploits external computing resources (unused servers with different performance) as well as internal computing resources (unused CPU cores in the server) available in heterogeneous data centers. Additionally, we also propose to consider locations of the servers when migrating VMs (Virtual Machines) among servers in a rack, which has a large impact on the on-chip temperatures and performance due to the heat conduction; when VMs run on the two closest servers in the rack, the ambient temperature of servers is up to 6.2 degrees higher, compared to the case where VMs run on the two farthest servers, so that on-chip temperature more rapidly increases causing up to 13.5% of performance degradation due to more frequent thermal throttling. When the temperature of a CPU core in a server exceeds a pre-defined thermal threshold, our proposed technique estimates the impact of VM migrations on performance (e.g., performance degradation due to the physical machine migrations and/or core migrations of VMs). Depending on the estimated performance impact of VM migrations, our technique adaptively employs the following three methods: 1) a method that migrates a VM to another distant server with different performance, 2) a method that migrates VMs among CPU cores in the server, and 3) a DVFS-based method. In our experiments, our proposed technique improves performance by 15.1% and saves system-wide EDP by 22.9%, on average, compared to a state-of-the-art DVFS-based DTM technique, satisfying thermal constraints.

[1]  Rajkumar Buyya,et al.  Energy Efficient Resource Management in Virtualized Cloud Data Centers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[2]  Ahmad Khonsari,et al.  Thermal-Aware Virtual Machine Allocation for Heterogeneous Cloud Data Centers , 2020, Energies.

[3]  Michel Auguin,et al.  Temperature-aware DVFS-DPM for real-time applications under variable ambient temperature , 2013, 2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES).

[4]  Peter Garraghan,et al.  Holistic Virtual Machine Scheduling in Cloud Datacenters towards Minimizing Total Energy , 2018, IEEE Transactions on Parallel and Distributed Systems.

[5]  Xiaohong Jiang,et al.  Live Migration of Multiple Virtual Machines with Resource Reservation in Cloud Computing Environments , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[6]  Young Geun Kim,et al.  Temperature-aware Adaptive VM Allocation in Heterogeneous Data Centers , 2019, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[7]  Ayse K. Coskun,et al.  Adaptive Power and Resource Management Techniques for Multi-threaded Workloads , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[8]  Junlong Zhou,et al.  Cost and makespan-aware workflow scheduling in hybrid clouds , 2019, J. Syst. Archit..

[9]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[10]  Lei He,et al.  Temperature and supply Voltage aware performance and power modeling at microarchitecture level , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Kevin Skadron,et al.  Recent thermal management techniques for microprocessors , 2012, CSUR.

[12]  Young Geun Kim,et al.  Stabilizing CPU Frequency and Voltage for Temperature-Aware DVFS in Mobile Devices , 2015, IEEE Transactions on Computers.

[13]  Mohamadreza Ahmadi,et al.  A dynamic VM consolidation technique for QoS and energy consumption in cloud environment , 2017, The Journal of Supercomputing.

[14]  Luca Abeni,et al.  Using Xen and KVM as real-time hypervisors , 2020, J. Syst. Archit..

[15]  Hai Jin,et al.  Performance and energy modeling for live migration of virtual machines , 2011, Cluster Computing.

[16]  Amir Masoud Rahmani,et al.  Automated negotiation for ensuring composite service requirements in cloud computing , 2019, J. Syst. Archit..

[17]  Pietro Tesi,et al.  Optimized Thermal-Aware Job Scheduling and Control of Data Centers , 2016, IEEE Transactions on Control Systems Technology.

[18]  Yale N. Patt,et al.  Predicting Performance Impact of DVFS for Realistic Memory Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[20]  Jianmin Qian,et al.  LG-RAM: Load-aware global resource affinity management for virtualized multicore systems , 2019, J. Syst. Archit..

[21]  Weisong Shi,et al.  Experimental Analysis of Application Specific Energy Efficiency of Data Centers with Heterogeneous Servers , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[22]  K JohnLizy,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007 .

[23]  Radu Marculescu,et al.  An Optimal Control Approach to Power Management for Multi-Voltage and Frequency Islands Multiprocessor Platforms under Highly Variable Workloads , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.

[24]  Laxmikant V. Kalé,et al.  "Cool" Load Balancing for High Performance Computing Data Centers , 2012, IEEE Trans. Computers.

[25]  Massoud Pedram,et al.  Power-aware virtual machine mapping in the data-center-on-a-chip paradigm , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).

[26]  José E. Moreira,et al.  True value: assessing and optimizing the cost of computing at the data center level , 2009, CF '09.

[27]  Naehyuck Chang,et al.  Dynamic thermal management in mobile devices considering the thermal coupling between battery and application processor , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[28]  Reza Azimi,et al.  Thermal-aware layout planning for heterogeneous datacenters , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[29]  Lingjia Tang,et al.  Understanding the Impact of Socket Density in Density Optimized Servers , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[30]  Young Geun Kim,et al.  An Adaptive Thermal Management Framework for Heterogeneous Multi-Core Processors , 2020, IEEE Transactions on Computers.

[31]  Binlei Cai,et al.  SLO-aware colocation: Harvesting transient resources from latency-critical services , 2019, J. Syst. Archit..

[32]  Manish Marwah,et al.  Minimizing data center SLA violations and power consumption via hybrid resource provisioning , 2011, 2011 International Green Computing Conference and Workshops.

[33]  Wei Huang,et al.  Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[34]  Lizhe Wang,et al.  Thermal aware workload placement with task-temperature profiles in a data center , 2011, The Journal of Supercomputing.

[35]  Massoud Pedram,et al.  Prediction and control of bursty cloud workloads: A fractal framework , 2014, 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[36]  Yuhui Deng,et al.  Thermal-Aware and DVFS-Enabled Big Data Task Scheduling for Data Centers , 2018, IEEE Transactions on Big Data.

[37]  J. Morris Chang,et al.  Cool Cloud: A Practical Dynamic Virtual Machine Placement Framework for Energy Aware Data Centers , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[38]  Dustin W. Demetriou,et al.  Combining cooling technology and facility design to improve HPC data center energy efficiency , 2016, 2016 15th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm).

[39]  Young Geun Kim,et al.  A Survey on Recent OS-Level Energy Management Techniques for Mobile Processing Units , 2018, IEEE Transactions on Parallel and Distributed Systems.

[40]  Yusuf Leblebici,et al.  Dynamic thermal management in 3D multicore architectures , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[41]  Christina Terese Joseph,et al.  IntMA: Dynamic Interaction-aware resource allocation for containerized microservices in cloud environments , 2020, J. Syst. Archit..

[42]  David Atienza,et al.  TheSPoT: Thermal Stress-Aware Power and Temperature Management for Multiprocessor Systems-on-Chip , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[43]  Lakshmi Ganesh,et al.  Integrated Approach to Data Center Power Management , 2013, IEEE Transactions on Computers.

[44]  Yefu Wang,et al.  Coordinating Power Control and Performance Management for Virtualized Server Clusters , 2011, IEEE Transactions on Parallel and Distributed Systems.

[45]  Naehyuck Chang,et al.  Energy-Optimal Dynamic Thermal Management: Computation and Cooling Power Co-Optimization , 2010, IEEE Transactions on Industrial Informatics.

[46]  Hsien-Hsin S. Lee,et al.  ATAC: Ambient Temperature-Aware Capping for Power Efficient Datacenters , 2014, SoCC.

[47]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[48]  Massoud Pedram,et al.  Trace-Based Analysis and Prediction of Cloud Computing User Behavior Using the Fractal Modeling Technique , 2014, 2014 IEEE International Congress on Big Data.

[49]  Shahin Nazarian,et al.  Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning Approach , 2019, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[50]  Kevin Skadron,et al.  Predictive Temperature-Aware DVFS , 2010, IEEE Transactions on Computers.

[51]  Sadagopan Srinivasan,et al.  Efficient interaction between OS and architecture in heterogeneous platforms , 2011, OPSR.

[52]  Ümit Y. Ogras,et al.  Predictive dynamic thermal and power management for heterogeneous mobile platforms , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[53]  Naehyuck Chang,et al.  Exploiting Application/System-Dependent Ambient Temperature for Accurate Microarchitectural Simulation , 2013, IEEE Trans. Computers.

[54]  Rajkumar Buyya,et al.  ETAS: Energy and thermal‐aware dynamic virtual machine consolidation in cloud data center with proactive hotspot mitigation , 2019, Concurr. Comput. Pract. Exp..

[55]  Tianyi Gao,et al.  Total cost of ownership model for data center technology evaluation , 2017, 2017 16th IEEE Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm).

[56]  Sherief Reda,et al.  Techniques for energy-efficient power budgeting in data centers , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[57]  Zoltán Ádám Mann,et al.  Multicore-Aware Virtual Machine Placement in Cloud Data Centers , 2016, IEEE Transactions on Computers.

[58]  Dario Pompili,et al.  VMAP: Proactive thermal-aware virtual machine allocation in HPC cloud datacenters , 2012, 2012 19th International Conference on High Performance Computing.

[59]  Seda Ogrenci Memik,et al.  Minimizing Thermal Variation Across System Components , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.