Automating Root Cause Analysis via Machine Learning in Agile Software Testing Environments

We apply machine learning to automate the root cause analysis in agile software testing environments. In particular, we extract relevant features from raw log data after interviewing testing engineers (human experts). Initial efforts are put into clustering the unlabeled data, and despite obtaining weak correlations between several clusters and failure root causes, the vagueness in the rest of the clusters leads to the consideration of labeling. A new round of interviews with the testing engineers leads to the definition of five ground-truth categories. Using manually labeled data, we train artificial neural networks that either classify the data or pre-process it for clustering. The resulting method achieves an accuracy of 88.9%. The methodology of this paper serves as a prototype or baseline approach for the extraction of expert knowledge and its adaptation to machine learning techniques for root cause analysis in agile environments.

[1]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[2]  Lionel C. Briand Novel Applications of Machine Learning in Software Testing , 2008, 2008 The Eighth International Conference on Quality Software.

[3]  John Paul,et al.  Automated Software Testing: Introduction, Management, and Performance , 1999 .

[4]  Jon Stearley,et al.  Bridging the Gaps: Joining Information Sources with Splunk , 2010, SLAML.

[5]  Weixi Li,et al.  Automatic Log Analysis using Machine Learning : Awesome Automatic Log Analysis version 2.0 , 2013 .

[6]  James J. Rooney,et al.  Root cause analysis for beginners , 2004 .

[7]  Jungho Kim,et al.  Machine Learning Frameworks for Automated Software Testing Tools : A Study , 2017 .

[8]  Michèle Sebag,et al.  A Machine Learning Approach for Statistical Software Testing , 2007, IJCAI.

[9]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[10]  Alain Denise,et al.  A generic method for statistical testing , 2004, 15th International Symposium on Software Reliability Engineering.

[11]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[12]  James H. Andrews,et al.  Testing using log file analysis: tools, methods, and issues , 1998, Proceedings 13th IEEE International Conference on Automated Software Engineering (Cat. No.98EX239).

[13]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[14]  Felix Salfner,et al.  Event-based Failure Prediction: An Extended Hidden Markov Model Approach , 2008, Ausgezeichnete Informatikdissertationen.

[15]  Alexander Jung,et al.  Machine Learning: Basic Principles , 2018 .

[16]  Matthieu Roy,et al.  Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[17]  Alberto Sillitti,et al.  Failure prediction based on log files using Random Indexing and Support Vector Machines , 2013, J. Syst. Softw..

[18]  Wenke Lee Applying data mining to intrusion detection: the quest for automation, efficiency, and credibility , 2002, SKDD.

[19]  Feng Liu,et al.  Auto-encoder Based Data Clustering , 2013, CIARP.

[20]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[21]  Liming Zhu,et al.  Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices , 2017, IEEE Access.

[22]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[23]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[26]  Ran He,et al.  Information-Theoretic Measures for Objective Evaluation of Classifications , 2011, ArXiv.

[27]  Haym Hirsh,et al.  Learning to Predict Rare Events in Event Sequences , 1998, KDD.

[28]  Michael I. Jordan,et al.  Statistical software debugging , 2005 .

[29]  Julen Kahles Bastida Applying Machine Learning to Root Cause Analysis in Agile CI/CD Software Testing Environments , 2019 .

[30]  Bo Zong,et al.  LogLens: A Real-Time Log Analysis System , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[31]  Xiang Li,et al.  Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[33]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[34]  Ebrahim Bagheri,et al.  Machine Learning-based Software Testing: Towards a Classification Framework , 2011, SEKE.

[35]  Mika Mäntylä,et al.  Development and evaluation of a lightweight root cause analysis method (ARCA method) - Field studies at four software companies , 2011, Inf. Softw. Technol..

[36]  Gregory Tassey,et al.  Prepared for what , 2007 .

[37]  Glenn A. Fink,et al.  Predicting Computer System Failures Using Support Vector Machines , 2008, WASL.

[38]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.