Smart VM co-scheduling with the precise prediction of performance characteristics

Abstract Traditional virtualization systems cannot effectively isolate the shared micro-architectural resources among VMs. Different types of CPU and memory-intensive VMs contending for these shared resources will lead to different levels of performance degradation, which decreases the system efficiency and Quality of Service (QoS) in the cloud. To address these problems, we design and implement a smart VM co-scheduling system with precise prediction of performance characteristics. First, we identify the performance interference factors and design synthetic micro-benchmarks. By co-running these micro-benchmarks with VMs, we decouple two kinds of VM performance characteristics: VM contention sensitivity and contention intensity. Second, based on the characteristics, we build VM performance prediction model using machine learning techniques to quantify the precise levels of performance degradation. By co-running large numbers of different VMs and collecting their performance scores, we train a robust performance prediction model. Finally, based on the prediction model, we design contention aware VM scheduling algorithms to improve system efficiency and guarantee the QoS of VMs in the cloud. Our experimental results show that the performance prediction model achieves high accuracy and the smart VM scheduling algorithms based on the prediction improves system efficiency and VM performance stability.

[1]  Kaushik Dutta,et al.  Modeling virtualized applications using machine learning techniques , 2012, VEE '12.

[2]  Mary Lou Soffa,et al.  Characterizing multi-threaded applications based on shared-resource contention , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[3]  H. Howie Huang,et al.  TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments , 2011, IEEE Transactions on Parallel and Distributed Systems.

[4]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[5]  Stijn Eyerman,et al.  Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS XV.

[6]  Xiao Zhang,et al.  Online cache modeling for commodity multicore processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Li Zhao,et al.  CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[9]  David Black-Schaffer,et al.  Modeling performance variation due to cache sharing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[10]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[11]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[12]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Xi Chen,et al.  Cache contention and application performance prediction for multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[14]  Christina Delimitrou,et al.  iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[15]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[17]  Fang Liu,et al.  Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors , 2011, SIGMETRICS '11.

[18]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[19]  Ahmad Yasin,et al.  A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[20]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[22]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.

[23]  Ravi Iyer,et al.  Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[24]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[25]  Benjamin C. Lee,et al.  REF: resource elasticity fairness with sharing incentives for multiprocessors , 2014, ASPLOS.

[26]  Tao Li,et al.  Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[27]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[28]  Francisco J. Cazorla,et al.  Optimal task assignment in multithreaded processors: a statistical approach , 2012, ASPLOS XVII.

[29]  Wenzhi Chen,et al.  Performance-Monitoring-Based Traffic-Aware Virtual Machine Deployment on NUMA Systems , 2017, IEEE Systems Journal.

[30]  Svetozar Miuÿ,et al.  DejaVu: Accelerating Resource Allocation in Virtualized Environments , 2012 .

[31]  Zhuzhong Qian,et al.  Network-Aware Re-Scheduling: Towards Improving Network Performance of Virtual Machines in a Data Center , 2014, ICA3PP.

[32]  Wenzhi Chen,et al.  Precise contention-aware performance prediction on virtualized multicore system , 2017, J. Syst. Archit..

[33]  Mianxiong Dong,et al.  Radio Access Network Virtualization for the Social Internet of Things , 2015, IEEE Cloud Computing.

[34]  Kun Wang,et al.  Optimizing virtual machine scheduling in NUMA multicore systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[35]  Kevin Skadron,et al.  PRECISELY PREDICTING PERFORMANCE DEGRADATION DUE TO COLOCATING MULTIPLE EXECUTING APPLICATIONS ON A SINGLE MACHINE IS CRITICAL FOR IMPROVING UTILIZATION IN MODERN , 2012 .

[36]  H. Howie Huang,et al.  Matrix: Achieving Predictable Virtual Machine Performance in the Clouds , 2014, ICAC.

[37]  Karsten Schwan,et al.  Region scheduling: efficiently using the cache architectures via page-level affinity , 2012, ASPLOS XVII.

[38]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[39]  Michael Stumm,et al.  Enhancing operating system support for multicore processors by using hardware performance monitoring , 2009, OPSR.

[40]  Kaushik Dutta,et al.  Application performance modeling in a virtualized environment , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[41]  Xiao Zhang,et al.  Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.

[42]  Onur Mutlu,et al.  Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems , 2012, ACM Trans. Comput. Syst..

[43]  Gang Ren,et al.  Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.

[44]  Xiaoning Ding,et al.  ULCC: a user-level facility for optimizing shared cache performance on multicores , 2011, PPoPP '11.

[45]  Lingjia Tang,et al.  Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.

[46]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[47]  Tsuyoshi Murata,et al.  {m , 1934, ACML.