An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions

AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Heng Li,et al.  Which log level should developers choose for a new logging statement? , 2017, Empirical Software Engineering.

[3]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[4]  Qiang Fu,et al.  Healing online service systems via mining historical issue repositories , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[5]  Mykola Pechenizkiy,et al.  An Overview of Concept Drift Applications , 2016 .

[6]  Nuno Vasconcelos,et al.  Cost-Sensitive Support Vector Machines , 2012, Neurocomputing.

[7]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[8]  David W. Binkley,et al.  Predicting relevance of change recommendations , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Tong Zhang,et al.  Learning Nonlinear Functions Using Regularized Greedy Forest , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[11]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[13]  Latesh G. Malik,et al.  A review on real time data stream classification and adapting to various concept drift scenarios , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[14]  Qiang Fu,et al.  Mining Historical Issue Repositories to Heal Large-Scale Online Service Systems , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[15]  Nicola Torelli,et al.  ROSE: a Package for Binary Imbalanced Learning , 2014, R J..

[16]  Evgenia Smirni,et al.  Spatial–Temporal Prediction Models for Active Ticket Managing in Data Centers , 2018, IEEE Transactions on Network and Service Management.

[17]  JiangZhen Ming,et al.  Predicting Node Failures in an Ultra-Large-Scale Cloud Computing Platform , 2020 .

[18]  David W. Binkley,et al.  Practical guidelines for change recommendation using association rule mining , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[19]  Nathalie Japkowicz,et al.  Big Data Analysis: New Algorithms for a New Society , 2015 .

[20]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[21]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[22]  Qiang Fu,et al.  Software analytics for incident management of online services: An experience report , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[24]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[25]  Ruping Pan,et al.  Regulation of brown and beige fat by microRNAs. , 2017, Pharmacology & therapeutics.

[26]  Lars Grunske,et al.  A comparison of machine learning algorithms for proactive hard disk drive failure detection , 2013, ISARCS '13.

[27]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[28]  J. Evans Straightforward Statistics for the Behavioral Sciences , 1995 .

[29]  Xin Yao,et al.  The impact of parameter tuning on software effort estimation using learning machines , 2013, PROMISE.

[30]  Bartosz Krawczyk,et al.  Online ensemble learning with abstaining classifiers for drifting and noisy data streams , 2017, Appl. Soft Comput..

[31]  Bartosz Krawczyk,et al.  Evolving rule-based classifiers with genetic programming on GPUs for drifting data streams , 2019, Pattern Recognit..

[32]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[33]  Qiang Fu,et al.  Identifying Recurrent and Unknown Performance Issues , 2014, 2014 IEEE International Conference on Data Mining.

[34]  Evgenia Smirni,et al.  Managing Data Center Tickets: Prediction and Active Sizing , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[35]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[36]  Harald C. Gall,et al.  Time variance and defect prediction in software projects , 2011, Empirical Software Engineering.

[37]  Leakage in data mining: Formulation, detection, and avoidance , 2012, TKDD.

[38]  Philip S. Yu,et al.  Mining Concept-Drifting Data Streams , 2010, Data Mining and Knowledge Discovery Handbook.

[39]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[40]  Yan Liu,et al.  Medical data mining: insights from winning two competitions , 2010, Data Mining and Knowledge Discovery.

[41]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[42]  Tim Menzies,et al.  Is "Better Data" Better Than "Better Data Miners"? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[43]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[44]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[45]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[46]  Jerzy Stefanowski,et al.  Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams , 2014, NFMCP.

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[48]  Jasmina Bogojeska,et al.  Predicting Disk Replacement towards Reliable Data Centers , 2016, KDD.

[49]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[50]  Alberto Cano,et al.  Kappa Updated Ensemble for drifting data stream mining , 2019, Machine Learning.

[51]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[52]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[53]  Qiang Fu,et al.  Experience report on applying software analytics in incident management of online service , 2017, Automated Software Engineering.

[54]  Andrea Rosà,et al.  Predicting and Mitigating Jobs Failures in Big Data Clusters , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[55]  Giuseppe Aceto,et al.  MIMETIC: Mobile encrypted traffic classification using multimodal deep learning , 2019, Comput. Networks.

[56]  Shane McIntosh,et al.  An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[57]  Heng Li,et al.  Adopting Autonomic Computing Capabilities in Existing Large-Scale Systems , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[58]  Jerzy Stefanowski,et al.  Prequential AUC: properties of the area under the ROC curve for data streams with concept drift , 2017, Knowledge and Information Systems.

[59]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[60]  Bianca Schroeder,et al.  Proactive error prediction to improve storage system reliability , 2017, USENIX ATC.

[61]  Ahmed E. Hassan,et al.  The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[62]  Shie Mannor,et al.  Concept Drift Detection Through Resampling , 2014, ICML.

[63]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[64]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[65]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[66]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[67]  Harald C. Gall,et al.  Tracking concept drift of software projects using defect prediction quality , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[68]  Dongmei Zhang,et al.  Predicting Node failure in cloud service systems , 2018, ESEC/SIGSOFT FSE.

[69]  Peng Li,et al.  Improving Service Availability of Cloud Systems by Predicting Disk Error , 2018, USENIX ATC.

[70]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[71]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[72]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[73]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[74]  Peng Huang,et al.  AIOps: Real-World Challenges and Research Innovations , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[75]  Ahmed E. Hassan,et al.  Predicting Node Failures in an Ultra-Large-Scale Cloud Computing Platform , 2020, ACM Trans. Softw. Eng. Methodol..

[76]  Felix Bießmann,et al.  On Challenges in Machine Learning Model Management , 2018, IEEE Data Eng. Bull..

[77]  Naoyasu Ubayashi,et al.  Studying just-in-time defect prediction using cross-project models , 2015, Empirical Software Engineering.

[78]  Foutse Khomh,et al.  Software Engineering for Machine-Learning Applications: The Road Ahead , 2018, IEEE Software.

[79]  Xin Chen,et al.  Failure Prediction of Jobs in Compute Clouds: A Google Cluster Case Study , 2014, 2014 IEEE International Symposium on Software Reliability Engineering Workshops.

[80]  Qiang Fu,et al.  Correlating events with time series for incident diagnosis , 2014, KDD.

[81]  Hang Dong,et al.  Outage Prediction and Diagnosis for Cloud Service Systems , 2019, WWW.

[82]  Dongmei Zhang,et al.  Identifying impactful service system problems via log analysis , 2018, ESEC/SIGSOFT FSE.

[83]  Bianca Schroeder,et al.  Learning from Failure Across Multiple Clusters: A Trace-Driven Approach to Understanding, Predicting, and Mitigating Job Terminations , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[84]  Ying Zou,et al.  Towards just-in-time suggestions for log changes , 2016, Empirical Software Engineering.

[85]  R. Ledesma,et al.  Cliff's Delta Calculator: A non-parametric effect size program for two groups of observations , 2010 .

[86]  Andrea Rosà,et al.  Catching failures of failures at big-data clusters: A two-level neural network approach , 2015, 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS).

[87]  Mark Last,et al.  Online classification of nonstationary data streams , 2002, Intell. Data Anal..