Bug Severity Assessment in Cross Project Context and Identifying Training Candidates

The automatic bug severity prediction will be useful in prioritising the development efforts, allocating resources and bug fixer. It needs historical data on which classifiers can be trained. In the absence of such historical data cross project prediction provides a good solution. In this paper, our objective is to automate the bug severity prediction by using a bug metric summary and to identify best training candidates in cross project context. The text mining technique has been used to extract the summary terms and trained the classifiers using these terms. About 63 training candidates have been designed by combining seven datasets of Eclipse projects to develop the severity prediction models. To deal with the imbalance bug data problem, we employed two approaches of ensemble by using two operators available in RapidMiner: Vote and Bagging. Results show that k-Nearest Neighbour (k-NN) performance is better than the Support Vector Machine (SVM) performance. Naive Bayes f-measure performance is poor, i.e. below 34.25%. In case of k-NN, developing training candidates by combining more than one training datasets helps in improving the performances (f-measure and accuracy). The two ensemble approaches have improved the f-measure performance up to 5% and 10% respectively for the severity levels having less number of bug reports in comparison of major severity level. We have further motivated the paper with a cross project bug severity prediction between Eclipse and Mozilla products. Results show that Mozilla products can be used to build reliable prediction models for Eclipse products and vice versa in case of SVM and k-NN classifiers.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[3]  Jongmoon Baik,et al.  Value-cognitive boosting with a support vector machine for cross-project defect prediction , 2014, Empirical Software Engineering.

[4]  Punam Bedi,et al.  An empirical evaluation of cross project priority prediction , 2014, International Journal of System Assurance Engineering and Management.

[5]  David Lo,et al.  On the unreliability of bug severity data , 2016, Empirical Software Engineering.

[6]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[7]  TurhanBurak,et al.  Data mining source code for locating software bugs , 2009 .

[8]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[9]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[10]  K. K. Chaturvedi,et al.  An Empirical Comparison of Machine Learning Techniques in Predicting the Bug Severity of Open and Closed Source Projects , 2012, Int. J. Open Source Softw. Process..

[11]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[12]  David Lo,et al.  Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[13]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[14]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[15]  Ayse Basar Bener,et al.  Data mining source code for locating software bugs: A case study in telecommunication industry , 2009, Expert Syst. Appl..