On the adoption and impact of predictive analytics for server incident reduction

The Predictive Analytics for Server Incident Reduction (PASIR) solution developed at IBM has been broadly deployed to 130 IT environments since the beginning of 2014. The infrastructures of these IT environments, pertaining to various industries around the world, are serviced by IBM support groups. More specifically, incidents occurring on servers, including the descriptions of the problems, are reported into a ticket management system. These tickets are then resolved by the assigned support teams, which record in the system the resolution steps taken. PASIR, first classifies the incident tickets of an IT environment to identify high-impact incidents describing server unavailability and performance degradation issues by using ticket descriptions and resolutions. Second, the occurrence of these high-impact tickets is correlated with server properties and utilization measures to identify troubled server configurations and prescribe improvement actions through multivariate analysis. In this paper, we present the findings from deploying our two-step machine learning model in the field. In particular, we describe the PASIR methodology, from ticket classification to the recommendation of modernization actions. We also assess the process of manual ticket labeling and the impact of noisy input data on our automatic classifier, and we demonstrate the model effectiveness by comparing predictions on the impact of prescriptive actions with actual system improvements.

[1]  Mengen Chen,et al.  Short Text Classification Improved by Learning Multi-Granularity Topics , 2011, IJCAI.

[2]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[3]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4]  Bai Rujiang,et al.  A Novel Conception Based Texts Classification Method , 2009, 2009 International e-Conference on Advanced Science and Technology.

[5]  David Lanyi,et al.  Classifying server behavior and predicting impact of modernization actions , 2013, Proceedings of the 9th International Conference on Network and Service Management (CNSM 2013).

[6]  Jasmina Bogojeska,et al.  Hierarchical Incident Ticket Classification with Minimal Supervision , 2014, 2014 IEEE International Conference on Data Mining.

[7]  Mohammad Saraee,et al.  A new unsupervised feature selection method for text clustering based on genetic algorithms , 2012, Journal of Intelligent Information Systems.

[8]  J. Friedman Stochastic gradient boosting , 2002 .

[9]  Eduardo Pinheiro,et al.  DRAM errors in the wild , 2011, Commun. ACM.

[10]  Jiawei Han,et al.  Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents , 2014, SDM.

[11]  Daniela Rosu,et al.  Multi-dimensional Knowledge Integration for Efficient Incident Management in a Services Cloud , 2009, 2009 IEEE International Conference on Services Computing.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Yinan Zhang,et al.  A phrase mining framework for recursive construction of a topical hierarchy , 2013, KDD.

[17]  Xin Xu,et al.  A Class-Incremental Learning Method for Multi-Class Support Vector Machines in Text Classification , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[18]  K. R. Chandran,et al.  Naïve Bayes text classification with positive features selected by statistical method , 2009, 2009 First International Conference on Advanced Computing.

[19]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[20]  Yixin Diao,et al.  Rule-Based Problem Classification in IT Service Management , 2009, 2009 IEEE International Conference on Cloud Computing.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[23]  Yixin Diao,et al.  Modeling a complex global service delivery system , 2011, Proceedings of the 2011 Winter Simulation Conference (WSC).

[24]  Yifan He,et al.  A Comparison among Three Neural Networks for Text Classification , 2006, 2006 8th international Conference on Signal Processing.

[25]  Xin Li,et al.  An Optimal SVM-Based Text Classification Algorithm , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[26]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[27]  Bianca Schroeder,et al.  Understanding latent sector errors and how to protect against them , 2010, TOS.

[28]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..