Detecting performance interference in cloud-based web services

Web services have increasingly begun to rely on public cloud platforms. The virtualization technologies employed by public clouds can however trigger contention between virtual machines (VMs) for shared physical machine (PM) resources thereby leading to performance problems for the Web service. Past studies have exploited PM level performance metrics such as Clock Cycles Per Instruction to detect such platform induced performance interference. Unfortunately, public cloud customers do not have access to such metrics. They can typically only access VM-level metrics and application level metrics such as transaction response times and such metrics alone are often not useful for detecting inter-VM contention. This poses a difficult challenge to Web service operators for detecting and managing platform induced performance interference issues inside the cloud. We propose a machine learning based interference detection technique to address this problem. The technique applies collaborative filtering to predict whether a given transaction being processed by a Web service is suffering adversely from interference. The results can then be used by a management controller to trigger remedial actions, e.g., reporting problems to the system manager or switching cloud providers. Results using a realistic Web benchmark show that the approach is effective. The most effective variant of our approach is able to detect about 96% of performance interference events with almost no false alarms.

[1]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[2]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[3]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[4]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[5]  Lior Rokach,et al.  Recommender Systems Handbook , 2010 .

[6]  Jie Liu,et al.  Algorithm Design for Performance Aware VM Consolidation , 2013 .

[7]  Christina Delimitrou,et al.  iBench: Quantifying interference for datacenter applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Chita R. Das,et al.  D-factor: a quantitative model of application slow-down in multi-resource shared systems , 2012, SIGMETRICS '12.

[9]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[10]  Lucio Grandinetti,et al.  Autonomic resource contention‐aware scheduling , 2015, Softw. Pract. Exp..

[11]  Ziming Zhang,et al.  Efficient and Accurate Anomaly Identification Using Reduced Metric Space in Utility Clouds , 2012, 2012 IEEE Seventh International Conference on Networking, Architecture, and Storage.

[12]  Hai Jin,et al.  CCAP: A Cache Contention-Aware Virtual Machine Placement Approach for HPC Cloud , 2013, International Journal of Parallel Programming.

[13]  Jerome A. Rolia,et al.  Characterizing the scalability of a large web-based shopping system , 2001, ACM Trans. Internet Techn..

[14]  Calton Pu,et al.  An Analysis of Performance Interference Effects in Virtual Environments , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[15]  Evgenia Smirni,et al.  Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[16]  Tommaso Cucinotta,et al.  The effects of scheduling, workload type and consolidation scenarios on virtual machine performance and their prediction through optimized artificial neural networks , 2011, J. Syst. Softw..

[17]  Jie Xu,et al.  Improved energy-efficiency in cloud datacenters with interference-aware virtual machine placement , 2013, 2013 IEEE Eleventh International Symposium on Autonomous Decentralized Systems (ISADS).

[18]  I. Jolliffe Principal Component Analysis , 2002 .

[19]  Yungang Bao,et al.  Rethinking Virtual Machine Interference in the Era of Cloud Applications , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[20]  João Paulo Magalhães,et al.  Anomaly Detection Techniques for Web-Based Applications: An Experimental Study , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[21]  Giuliano Casale,et al.  A Feasibility Study of Host-Level Contention Detection by Guest Virtual Machines , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[22]  Song Fu,et al.  Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[23]  Shin Gyu Kim,et al.  Virtual machine consolidation based on interference modeling , 2013, The Journal of Supercomputing.

[24]  Xiaohui Gu,et al.  UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems , 2012, ICAC '12.

[25]  Bo Li,et al.  iAware: Making Live Migration of Virtual Machines Interference-Aware in the Cloud , 2014, IEEE Transactions on Computers.

[26]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[27]  Qian Zhu,et al.  A Performance Interference Model for Managing Consolidated Workloads in QoS-Aware Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[28]  Mohammad Yahya H. Al-Shamri,et al.  Power coefficient as a similarity measure for memory-based collaborative recommender systems , 2014, Expert Syst. Appl..

[29]  Ricardo Bianchini,et al.  DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.

[30]  João Paulo Magalhães,et al.  Detection of Performance Anomalies in Web-Based Applications , 2010, 2010 Ninth IEEE International Symposium on Network Computing and Applications.

[31]  Bowen Zhou,et al.  Mitigating interference in cloud services by middleware reconfiguration , 2014, Middleware.

[32]  Dimitris Plexousakis,et al.  Incremental Collaborative Filtering for Highly-Scalable Recommendation Algorithms , 2005, ISMIS.

[33]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[34]  Song Fu,et al.  Performance Metric Selection for Autonomic Anomaly Detection on Cloud Computing Systems , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[35]  Jerome A. Rolia,et al.  Resource contention detection and management for consolidated workloads , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[36]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[37]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[38]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[39]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[40]  Xiaohui Gu,et al.  PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.