Making Neighbors Quiet: An Approach to Detect Virtual Resource Contention

It is imperative for public cloud providers to guarantee performance targets for tenants’ virtual machines (VMs) while respecting strict business confidentiality, e.g., having no information on applications nor their performance. A large body of related work addresses the challenges of detecting performance interferences by leveraging client's quality of service (QoS) metrics, e.g., latency, and additional profiling servers. In this paper, we take the perspective of the cloud provider and propose a general black-box approach that detects different resource contentions by throttling neighboring VMs. Specifically, we design a three-phase detection algorithm that includes: (i) an alarm phase to identify statistical outliers using control charts; (ii) a passive clustering phase to match the current sample to historical behaviors; and (iii) an active throttling phase to discern contentions from application phase changes via throttling. The algorithm is specifically designed for scenarios where multiple co-located VMs request detection analysis simultaneously. We implement and evaluate the proposed three-phase algorithm on four latency sensitive applications, i.e., Wikimedia and three benchmarks from Cloudsuite. Our extensive experimental results show that we can reach an average detection accuracy above 90 percent while limiting the performance degradation experienced by offender workloads to short learning phases.

[1]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[2]  Jerome A. Rolia,et al.  Resource contention detection and management for consolidated workloads , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[3]  Calton Pu,et al.  An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[4]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[5]  Chandra Krintz,et al.  Online phase detection algorithms , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[6]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[7]  Johan Tordsson,et al.  How will Your Workload Look Like in 6 Years? Analyzing Wikimedia's Workload , 2014, 2014 IEEE International Conference on Cloud Engineering.

[8]  Tipp Moseley,et al.  Measuring interference between live datacenter applications , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[10]  Walter Binder,et al.  QoS-Aware Service VM Provisioning in Clouds: Experiences, Models, and Cost Analysis , 2013, ICSOC.

[11]  Walter Binder,et al.  Opportunistic Service Provisioning in the Cloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[12]  George Kesidis,et al.  Effective Capacity Modulation as an Explicit Control Knob for Public Cloud Profitability , 2016, 2016 IEEE International Conference on Autonomic Computing (ICAC).

[13]  Xiaohui Gu,et al.  PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[14]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[15]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[16]  Chita R. Das,et al.  CloudPD: Problem determination and diagnosis in shared dynamic clouds , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  James E. Smith,et al.  Comparing Program Phase Detection Techniques , 2003, MICRO.

[18]  Fred Spiring,et al.  Introduction to Statistical Quality Control , 2007, Technometrics.

[19]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[20]  Mary Lou Soffa,et al.  Contention aware execution: online contention detection and response , 2010, CGO '10.

[21]  Xiaohui Gu,et al.  UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems , 2012, ICAC '12.

[22]  Evgenia Smirni,et al.  State-of-the-practice in data center virtualization: Toward a better understanding of VM usage , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[23]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[24]  David A. Wood,et al.  IPC Considered Harmful for Multiprocessor Workloads , 2006, IEEE Micro.

[25]  Robert Birke,et al.  Optimizing for Tail Sojourn Times of Cloud Clusters , 2018, IEEE Transactions on Cloud Computing.

[26]  Robert Birke,et al.  Power of redundancy: Designing partial replication for multi-tier applications , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[27]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[28]  Evgenia Smirni,et al.  Achieving application-centric performance targets via consolidation on multicores: myth or reality? , 2012, HPDC '12.

[29]  Ricardo Bianchini,et al.  DejaVu: accelerating resource allocation in virtualized environments , 2012, ASPLOS XVII.