Impact of HW and OS type and currency on server availability derived from problem ticket analysis

Technology refresh is an important component in data center management. The goal of this paper is to assess the impact of HW and OS currency on server availability based on a large set of incident tickets and server attributes data collected from several different IT environments. In order to achieve this we first identify the server failure incidents using a machine learning method for automatic ticket classification. Then we conduct the data analysis to inspect the impact of HW and OS type along with their currency on the rates of server failures. This can further be used to derive guidelines to support the technology refresh decisions in the data centers.

[1]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[2]  Xin Li,et al.  An Optimal SVM-Based Text Classification Algorithm , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[3]  Mark S. Squillante,et al.  Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.

[4]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[5]  David Lanyi,et al.  Classifying server behavior and predicting impact of modernization actions , 2013, Proceedings of the 9th International Conference on Network and Service Management (CNSM 2013).

[6]  Bianca Schroeder,et al.  Understanding latent sector errors and how to protect against them , 2010, TOS.

[7]  Jean S. Bozman,et al.  Server Refresh: Meeting the Changing Needs of Enterprise IT with Hardware/Software Optimization , 2010 .

[8]  Xin Li,et al.  A Memory Soft Error Measurement on Production Systems , 2007, USENIX Annual Technical Conference.

[9]  Eduardo Pinheiro,et al.  DRAM errors in the wild: a large-scale field study , 2009, SIGMETRICS '09.

[10]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[11]  Daniela Rosu,et al.  Multi-dimensional Knowledge Integration for Efficient Incident Management in a Services Cloud , 2009, 2009 IEEE International Conference on Services Computing.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Yifan He,et al.  A Comparison among Three Neural Networks for Text Classification , 2006, 2006 8th international Conference on Signal Processing.

[14]  Jim Gray,et al.  Empirical Measurements of Disk Failure Rates and Error Rates , 2007, ArXiv.

[15]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[16]  Dan Steinberg,et al.  Stochastic Gradient Boosting: An Introduction to TreeNet™ , 2002, AusDM.

[17]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[18]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.

[19]  Yixin Diao,et al.  Rule-Based Problem Classification in IT Service Management , 2009, 2009 IEEE International Conference on Cloud Computing.

[20]  K. R. Chandran,et al.  Naïve Bayes text classification with positive features selected by statistical method , 2009, 2009 First International Conference on Advanced Computing.

[21]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[22]  Dirk Husemann,et al.  Automatic Classification of Change Requests for Improved IT Service Quality , 2011, 2011 Annual SRII Global Conference.

[23]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Xin Xu,et al.  A Class-Incremental Learning Method for Multi-Class Support Vector Machines in Text Classification , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[25]  Bai Rujiang,et al.  A Novel Conception Based Texts Classification Method , 2009, 2009 International e-Conference on Advanced Science and Technology.

[26]  Liang Tang,et al.  Optimizing system monitoring configurations for non-actionable alerts , 2012, 2012 IEEE Network Operations and Management Symposium.