Predicting the priority of a reported bug using machine learning techniques and cross project validation

In bug repositories, we receive a large number of bug reports on daily basis. Managing such a large repository is a challenging job. Priority of a bug tells that how important and urgent it is for us to fix. Priority of a bug can be classified into 5 levels from PI to P5 where PI is the highest and P5 is the lowest priority. Correct prioritization of bugs helps in bug fix scheduling/assignment and resource allocation. Failure of this will result in delay of resolving important bugs. This requires a bug prediction system which can predict the priority of a newly reported bug. Cross project validation is also an important concern in empirical software engineering where we train classifier on one project and test it for prediction on other projects. In the available literature, we found very few papers for bug priority prediction and none of them dealt with cross project validation. In this paper, we have evaluated the performance of different machine learning techniques namely Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbors (KNN) and Neural Network (NNet) in predicting the priority of the newly coming reports on the basis of different performance measures. We performed cross project validation for 76 cases of five data sets of open office and eclipse projects. The accuracy of different machine learning techniques in predicting the priority of a reported bug within and across project is found above 70% except Naive Bayes technique.

[1]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[2]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[3]  Dawson R. Engler,et al.  Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations , 2003, SAS.

[4]  Sunghun Kim,et al.  How long did it take to fix bugs? , 2006, MSR '06.

[5]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[6]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[7]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[8]  John Anvik,et al.  Assisting bug report triage through recommendation , 2007 .

[9]  Franz Wotawa,et al.  Automatic Software Bug Triage System (BTS) Based on Latent Semantic Indexing and Support Vector Machine , 2009, 2009 Fourth International Conference on Software Engineering Advances.

[10]  K. K. Chaturvedi,et al.  Determining Bug severity using machine learning techniques , 2012, 2012 CSI Sixth International Conference on Software Engineering (CONSEG).

[11]  Onaiza Maqbool,et al.  Bug Prioritization to Facilitate Bug Report Triage , 2012, Journal of Computer Science and Technology.

[12]  Fang Wu,et al.  Predicting Defect Priority Based on Neural Networks , 2010, ADMA.

[13]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[14]  Gerardo Canfora,et al.  Supporting change request assignment in open source development , 2006, SAC.

[15]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[16]  Ahmed Tamrawi,et al.  Fuzzy set-based automatic bug triaging. , 2011, ICSE 2011.

[17]  Onaiza Maqbool,et al.  Managing Open Bug Repositories through Bug Report Prioritization Using SVMs , 2010 .

[18]  Michael D. Ernst,et al.  Prioritizing Warning Categories by Analyzing Software History , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[19]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.

[20]  Andreas Zeller,et al.  Predicting Effort to Fix Software Bugs , 2007, Softwaretechnik-Trends.

[21]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[22]  Ahmed Tamrawi,et al.  Fuzzy set-based automatic bug triaging: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[23]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .