System situation ticket identification using SVMs ensemble

A framework for Situation Ticket Identification Using SVMs Ensemble is introduced.A domain words discovery algorithm for obtaining domain knowledge and words is proposed.A selective labeling policy based on the discovered domain words is presented.An ensemble of SVM classification model for accurate ticket classification is developed. System maintenance for large and complex IT infrastructures highly depends on automatic system monitoring, and the performance of system monitoring depends on their configurations specified by system administrators. Misconfigurations and frequent configuration changes are two main causes responsible for false positives (false alarms) that can consume limited maintenance resources and false negatives (missing alerts) that can cause serious system faults. Thus, identifying situation tickets that are created by humans is a critical task to help system administrators correct and improve the configurations of existing monitoring systems to minimize the false negatives.To address this issue, this paper proposes a situation ticket identification approach based on an ensemble of Support Vector Machines (SVMs), named STI-E, to discover situation tickets from the manual tickets that are created by humans. A primary advantage of this solution is that it can label the most representative tickets from the imbalanced manual tickets by administrators with minimal labeling effort using the discovered domain words from historical monitoring tickets. The proposed SVM ensemble classification model is also able to identify situation tickets with a higher accuracy than the classical SVM classification model. To demonstrate the effectiveness of the proposed approach, we empirically validate it on real system monitoring and manual tickets from a large enterprise IT infrastructure.

[1]  Haydemar Núñez,et al.  GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems , 2014, Appl. Soft Comput..

[2]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[3]  Korris Fu-Lai Chung,et al.  A subspace decision cluster classifier for text classification , 2011, Expert Syst. Appl..

[4]  Dino Isa,et al.  A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine , 2012, Expert Syst. Appl..

[5]  Robert Sabourin,et al.  From dynamic classifier selection to dynamic ensemble selection , 2008, Pattern Recognit..

[6]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[7]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[8]  Huidong Jin,et al.  CenKNN: a scalable and effective text classifier , 2014, Data Mining and Knowledge Discovery.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  David Lanyi,et al.  Impact of HW and OS type and currency on server availability derived from problem ticket analysis , 2014, 2014 IEEE Network Operations and Management Symposium (NOMS).

[11]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[12]  Ee-Peng Lim,et al.  On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[13]  Yang Liu,et al.  Combining integrated sampling with SVM ensembles for learning from imbalanced datasets , 2011, Inf. Process. Manag..

[14]  Dirk Husemann,et al.  Automatic Classification of Change Requests for Improved IT Service Quality , 2011, 2011 Annual SRII Global Conference.

[15]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Daniela Rosu,et al.  Multi-dimensional Knowledge Integration for Efficient Incident Management in a Services Cloud , 2009, 2009 IEEE International Conference on Services Computing.

[17]  Xin Li,et al.  An Optimal SVM-Based Text Classification Algorithm , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[18]  Anne Bouillard,et al.  Hidden anomaly detection in telecommunication networks , 2012, 2012 8th international conference on network and service management (cnsm) and 2012 workshop on systems virtualiztion management (svm).

[19]  Peter Desnoyers,et al.  Distributed data collection: archiving, indexing, and analysis , 2008 .

[20]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[21]  Abdallah Bashir Musa Comparative study on classification performance between support vector machine and logistic regression , 2012, International Journal of Machine Learning and Cybernetics.

[22]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[23]  Liang Tang,et al.  Identifying missed monitoring alerts based on unstructured incident tickets , 2013, Proceedings of the 9th International Conference on Network and Service Management (CNSM 2013).

[24]  Chrisina Jayne,et al.  Evaluation of hyperbox neural network learning for classification , 2014, Neurocomputing.

[25]  Anand Sivasubramaniam,et al.  Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.

[26]  Liang Tang,et al.  Optimizing system monitoring configurations for non-actionable alerts , 2012, 2012 IEEE Network Operations and Management Symposium.

[27]  Elaine Lawrence,et al.  Intelligent Network Management for Healthcare Monitoring , 2010, IEA/AIE.

[28]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[29]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[30]  Alexander Aiken,et al.  Alert Detection in System Logs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[31]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[32]  Edward Y. Chang,et al.  Aligning boundary in kernel space for learning imbalanced dataset , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[33]  Nojun Kwak,et al.  Feature extraction for classification problems and its application to face recognition , 2008, Pattern Recognit..

[34]  Xin Xu,et al.  A Class-Incremental Learning Method for Multi-Class Support Vector Machines in Text Classification , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[37]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[38]  Liang Tang,et al.  An integrated framework for optimizing automatic monitoring systems in large IT infrastructures , 2013, KDD.

[39]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[40]  Bei Yu,et al.  An evaluation of text classification methods for literary study , 2008, Lit. Linguistic Comput..

[41]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[42]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[43]  Han Tong Loh,et al.  Imbalanced text classification: A term weighting approach , 2009, Expert Syst. Appl..

[44]  Milos Hauskrecht,et al.  Boosting KNN text classification accuracy by using supervised term weighting schemes , 2009, CIKM.

[45]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.

[46]  Yixin Diao,et al.  Rule-Based Problem Classification in IT Service Management , 2009, 2009 IEEE International Conference on Cloud Computing.