Assessing Invariant Mining Techniques for Cloud-Based Utility Computing Systems

Likely system invariants model properties that hold in operating conditions of a computing system. Invariants may be mined offline from training datasets, or inferred during execution. Scientific work has shown that invariants’ mining techniques support several activities, including capacity planning and detection of failures, anomalies and violations of Service Level Agreements. However their practical application by operation engineers is still a challenge. We aim to fill this gap through an empirical analysis of three major techniques for mining invariants in cloud-based utility computing systems: clustering, association rules, and decision list. The experiments use independent datasets from real-world systems: a Google cluster, whose traces are publicly available, and a Software-as-a-Service platform used by various companies worldwide. We assess the techniques in two invariants’ applications, namely executions characterization and anomaly detection, using the metrics of coverage, recall and precision. A sensitivity analysis is performed. Experimental results allow inferring practical usage implications, showing that relatively few invariants characterize the majority of operating conditions, that precision and recall may drop significantly when trying to achieve a large coverage, and that techniques exhibit similar precision, though the supervised one a higher recall. Finally, we propose a general heuristic for selecting likely invariants from a dataset.

[1]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[2]  Haifeng Chen,et al.  Exploiting Local and Global Invariants for the Management of Large Scale Information Systems , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[3]  Yunhao Liu,et al.  Agnostic diagnosis: Discovering silent failures in wireless sensor networks , 2011, 2011 Proceedings IEEE INFOCOM.

[4]  Leonardo Mariani,et al.  Compatibility and Regression Testing of COTS-Component-Based Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[5]  Xin Chen,et al.  Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case Study , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[6]  Franck Cappello,et al.  Characterizing Cloud Applications on a Google Data Center , 2013, 2013 42nd International Conference on Parallel Processing.

[7]  Haifeng Chen,et al.  Fault detection and localization in distributed systems using invariant relationships , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[8]  Alex Simpkins,et al.  System Identification: Theory for the User, 2nd Edition (Ljung, L.; 1999) [On the Shelf] , 2012, IEEE Robotics & Automation Magazine.

[9]  Christian Borgelt,et al.  Induction of Association Rules: Apriori Implementation , 2002, COMPSTAT.

[10]  Haifeng Chen,et al.  Discovering likely invariants of distributed transaction systems for autonomic system management , 2006, 2006 IEEE International Conference on Autonomic Computing.

[11]  Jie Xu,et al.  An Analysis of Failure-Related Energy Waste in a Large-Scale Cloud Environment , 2014, IEEE Transactions on Emerging Topics in Computing.

[12]  Weisong Shi,et al.  Workload Analysis, Implications, and Optimization on a Production Hadoop Cluster: A Case Study on Taobao , 2014, IEEE Transactions on Services Computing.

[13]  Song Fu,et al.  Adaptive Anomaly Identification by Exploring Metric Subspace in Cloud Computing Infrastructures , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[14]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[15]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[18]  Haifeng Chen,et al.  Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[19]  Stefano Russo,et al.  Mining Invariants from SaaS Application Logs (Practical Experience Report) , 2014, 2014 Tenth European Dependable Computing Conference.

[20]  Michael D. Ernst,et al.  Eclat: Automatic Generation and Classification of Test Inputs , 2005, ECOOP.

[21]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[22]  Andrea Rosà,et al.  Failure Analysis and Prediction for Big-Data Systems , 2017, IEEE Transactions on Services Computing.

[23]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[24]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[25]  Stefano Russo,et al.  Using Invariants for Anomaly Detection: The Case Study of a SaaS Application , 2014, 2014 IEEE International Symposium on Software Reliability Engineering Workshops.

[26]  Eibe Frank,et al.  Combining Naive Bayes and Decision Tables , 2008, FLAIRS.

[27]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[28]  Ravishankar K. Iyer,et al.  Characterization of operational failures from a business data processing SaaS platform , 2014, ICSE Companion.

[29]  Xin Chen,et al.  Failure Prediction of Jobs in Compute Clouds: A Google Cluster Case Study , 2014, 2014 IEEE International Symposium on Software Reliability Engineering Workshops.

[30]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[31]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[32]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[33]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[34]  Haifeng Chen,et al.  Invariants Based Failure Diagnosis in Distributed Computing Systems , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[35]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[36]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[37]  Haifeng Chen,et al.  Efficient and Scalable Algorithms for Inferring Likely Invariants in Distributed Systems , 2007, IEEE Transactions on Knowledge and Data Engineering.

[38]  Gregory M. Kapfhammer,et al.  Dynamic invariant detection for relational databases , 2011, WODA '11.