Precise contention-aware performance prediction on virtualized multicore system

Virtualized multicore contention - aware performance prediction model.Virtual machine contention sensitivity and intensity features collection.Quantify the precise levels of performance degradation between VMs. Multicore systems are widely deployed in both the embedded and the high end computing infrastructures. However, traditional virtualization systems can not effectively isolate shared micro architectural resources among virtual machines (VMs) running on multicore systems. CPU and memory intensive VMs contending for these resources will lead to serious performance interference, which makes virtualization systems less efficient and VM performance less stable. In this paper, we propose a contention-aware performance prediction model on the virtualized multicore systems to quantify the performance degradation of VMs. First, we identify the performance interference factors and design synthetic micro-benchmarks to obtain VM's contention sensitivity and intensity features that are correlated with VM performance degradation. Second, based on the contention features, we build VM performance prediction model using machine learning techniques to quantify the precise levels of performance degradation. The proposed model can be used to optimize VM performance on multicore systems. Our experimental results show that the performance prediction model achieves high accuracy and the mean absolute error is 2.83%.

[1]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[2]  Christina Delimitrou,et al.  iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Tao Li,et al.  Optimizing virtual machine consolidation performance on NUMA server architecture for cloud workloads , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[4]  Francisco J. Cazorla,et al.  Optimal task assignment in multithreaded processors: a statistical approach , 2012, ASPLOS XVII.

[5]  Alexandra Fedorova,et al.  A case for NUMA-aware contention management on multicore systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Onur Mutlu,et al.  Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multicore Memory Systems , 2012, ACM Trans. Comput. Syst..

[7]  Gang Ren,et al.  Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.

[8]  Xiao Zhang,et al.  Towards practical page coloring-based multicore cache management , 2009, EuroSys '09.

[9]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[10]  Xiaoning Ding,et al.  ULCC: a user-level facility for optimizing shared cache performance on multicores , 2011, PPoPP '11.

[11]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[12]  Kaushik Dutta,et al.  Application performance modeling in a virtualized environment , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[13]  Fang Liu,et al.  Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors , 2011, SIGMETRICS '11.

[14]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[15]  H. Howie Huang,et al.  TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments , 2011, IEEE Transactions on Parallel and Distributed Systems.

[16]  Li Zhao,et al.  CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[17]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[18]  Xi Chen,et al.  Cache contention and application performance prediction for multi-core systems , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[19]  Stijn Eyerman,et al.  Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS 2010.

[20]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[21]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[22]  Kun Wang,et al.  Optimizing virtual machine scheduling in NUMA multicore systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[23]  Kevin Skadron,et al.  PRECISELY PREDICTING PERFORMANCE DEGRADATION DUE TO COLOCATING MULTIPLE EXECUTING APPLICATIONS ON A SINGLE MACHINE IS CRITICAL FOR IMPROVING UTILIZATION IN MODERN , 2012 .

[24]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[25]  Karsten Schwan,et al.  Region scheduling: efficiently using the cache architectures via page-level affinity , 2012, ASPLOS XVII.

[26]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Michael Stumm,et al.  Enhancing operating system support for multicore processors by using hardware performance monitoring , 2009, OPSR.

[28]  Alexandra Fedorova,et al.  Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS 2010.

[29]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[30]  Kaushik Dutta,et al.  Modeling virtualized applications using machine learning techniques , 2012, VEE '12.

[31]  Mary Lou Soffa,et al.  Characterizing multi-threaded applications based on shared-resource contention , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[32]  David Black-Schaffer,et al.  Modeling performance variation due to cache sharing , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[33]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[34]  Ahmad Yasin,et al.  A Top-Down method for performance analysis and counters architecture , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).