Improving expert prediction of issue resolution time

Predicting the resolution times of issue reports in software development is important, because it helps allocate resources adequately. However, issue resolution time (IRT) prediction is difficult and prediction quality is limited. A common approach in industry is to base predictions on expert knowledge. While this manual approach requires the availability and effort of experts, automated approaches using data mining and machine learning techniques require a small upfront investment for setting up the data collection and analysis infrastructure as well as the availability of sufficient past data for model building. Several approaches for automated IRT prediction have been proposed and evaluated. The aim of our study was (1) to compare the prediction quality of expert-based IRT prediction in a software company located in Estonia with that of various fully automated IRT prediction approaches proposed and used by other researchers, including k-means clustering, k-nearest neighbor classification, Naïve Bayes classification, decision trees, random forest (RF) and ordered logistic regression (OLR), and (2) to improve the current IRT prediction quality in the company at hand. For our study, we analyzed issue reports collected by the company in the period from April 2011 to January 2015. Regarding our first goal, we found that experts in the case company were able to predict IRTs approximately 50% of the time within the range of ±10% of the actual IRTs. In addition, 67% of the experts' predictions have an absolute error that is less or equal 0.5 hours. When applying the automated approaches used by other researchers to the company's data, we observed lower predictive quality as compared to IRT predictions made by the company's experts, even for the best-performing approaches RF and OLR. Regarding our second goal, after unsuccessfully experimenting with improvements to the RF and OLR based approaches, we managed to develop models based on text analysis that achieved a prediction quality at par or better than that achieved by company experts.

[1]  M. J. Norušis,et al.  PASW Statistics 18 Advanced Statistical Procedures Companion , 2010 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Uzma Raja,et al.  All complaints are not created equal: text analysis of open source software defect reports , 2012, Empirical Software Engineering.

[4]  Magne Jørgensen,et al.  What We Do and Don't Know about Software Development Effort Estimation , 2014, IEEE Softw..

[5]  L. Muflikhah,et al.  Document Clustering Using Concept Space and Cosine Similarity Measurement , 2009, 2009 International Conference on Computer Technology and Development.

[6]  Hui Zeng,et al.  Estimation of software defects fix effort using neural networks , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[7]  Akira Utsumi Evaluating the performance of nonnegative matrix factorization for constructing semantic spaces: Comparison to latent semantic analysis , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[8]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[9]  Mohamed Kholief,et al.  Improving bug fix-time prediction model by filtering out outliers , 2013, 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE).

[10]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[11]  M. Kholief,et al.  Bug fix-time prediction model using naïve Bayes classifier , 2012, 2012 22nd International Conference on Computer Theory and Applications (ICCTA).

[12]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[13]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[14]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[15]  Dietmar Pfahl,et al.  Using text clustering to predict defect resolution time: a conceptual replication and an evaluation of prediction accuracy , 2015, Empirical Software Engineering.

[16]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[17]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[18]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19]  Oskar Gross Finding Non-Trivially Similar Documents from a LargeDocument Corpus , 2011 .

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  Serge Demeyer,et al.  Filtering Bug Reports for Fix-Time Analysis , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[22]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.