Predicting the delay of issues with due dates in software projects

Issue-tracking systems (e.g. JIRA) have increasingly been used in many software projects. An issue could represent a software bug, a new requirement or a user story, or even a project task. A deadline can be imposed on an issue by either explicitly assigning a due date to it, or implicitly assigning it to a release and having it inherit the release’s deadline. This paper presents a novel approach to providing automated support for project managers and other decision makers in predicting whether an issue is at risk of being delayed against its deadline. A set of features (hereafter called risk factors) characterizing delayed issues were extracted from eight open source projects: Apache, Duraspace, Java.net, JBoss, JIRA, Moodle, Mulesoft, and WSO2. Risk factors with good discriminative power were selected to build predictive models to predict if the resolution of an issue will be at risk of being delayed. Our predictive models are able to predict both the the extend of the delay and the likelihood of the delay occurrence. The evaluation results demonstrate the effectiveness of our predictive models, achieving on average 79 % precision, 61 % recall, 68 % F-measure, and 83 % Area Under the ROC Curve. Our predictive models also have low error rates: on average 0.66 for Macro-averaged Mean Cost-Error and 0.72 Macro-averaged Mean Absolute Error.

[1]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[2]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[3]  Thomas Zimmermann,et al.  Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[4]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[5]  Anh Duc Duong,et al.  Addressing cold-start problem in recommendation systems , 2008, ICUIMC '08.

[6]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[7]  Wil M. P. van der Aalst,et al.  A recommendation system for predicting risks across multiple business process instances , 2015, Decis. Support Syst..

[8]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  Xu Ruzhi,et al.  CMM-based software risk control optimization , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[10]  David Lo,et al.  Dual analysis for recommending developers to resolve bugs , 2015, J. Softw. Evol. Process..

[11]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[12]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[13]  Aditya K. Ghose,et al.  Characterization and Prediction of Issue-Related Risks in Software Projects , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[14]  Gediminas Adomavicius,et al.  A Naive Bayes machine learning approach to risk prediction using censored, time‐to‐event data , 2014, Statistics in medicine.

[15]  David Lo,et al.  Automated prediction of bug report priority using multi-factor analysis , 2014, Empirical Software Engineering.

[16]  Bo Yu,et al.  Combining Classifiers in Software Quality Prediction: A Neural Network Approach , 2005, ISNN.

[17]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.

[18]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[19]  Limin Wang,et al.  Combining decision tree and Naive Bayes for classification , 2006, Knowl. Based Syst..

[20]  Ruzhi Xu,et al.  CMM-based software risk control optimization , 2003, IRI.

[21]  Aditya K. Ghose,et al.  Predicting Delays in Software Projects Using Networked Classification (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[23]  Mark Keil,et al.  Software project risks and their effect on outcomes , 2004, CACM.

[24]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[25]  Donald E. Neumann An Enhanced Neural Network Technique for Software Risk Analysis , 2002, IEEE Trans. Software Eng..

[26]  J. Friedman Stochastic gradient boosting , 2002 .

[27]  LiuMei,et al.  Software project risk analysis using Bayesian networks with causality constraints , 2013, DSS 2013.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[30]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[31]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[32]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[33]  David Lo,et al.  Automatic, high accuracy prediction of reopened bugs , 2014, Automated Software Engineering.

[34]  Aftab Iqbal,et al.  Understanding Contributor to Developer Turnover Patterns in OSS Projects: A Case Study of Apache Projects , 2014 .

[35]  David A. Cieslak,et al.  Evaluating Probability Estimates from Decision Trees , 2006 .

[36]  B. Boehm Software risk management: principles and practices , 1991, IEEE Software.

[37]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[38]  Andrea Esuli,et al.  Evaluation Measures for Ordinal Regression , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[39]  Ferdian Thung,et al.  Automatic Defect Categorization , 2012, 2012 19th Working Conference on Reverse Engineering.

[40]  Gianluigi Viscusi,et al.  Pattern detection for conceptual schema recovery in data‐intensive systems , 2014, J. Softw. Evol. Process..

[41]  David Lo,et al.  Automatic Fine-Grained Issue Report Reclassification , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[42]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[43]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[44]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[45]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[46]  Padraig Cunningham,et al.  Exploring the Relationship between Membership Turnover and Productivity in Online Communities , 2014, ICWSM.

[47]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[48]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[49]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[50]  David Lo,et al.  ELBlocker: Predicting blocking bugs with ensemble imbalance learning , 2015, Inf. Softw. Technol..

[51]  Ken-ichi Matsumoto,et al.  Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[52]  Harvey P. Siy,et al.  Understanding the Effects of Developer Activities on Inspection Interval , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[53]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[54]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[55]  Earl T. Barr,et al.  Uncertainty, risk, and information value in software requirements and architecture , 2014, ICSE.

[56]  Sven Apel,et al.  Types and modularity for implicit invocation with implicit announcement , 2010, TSEM.

[57]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[58]  Yong Hu,et al.  Software Project Risk Management Modeling with Neural Network and Support Vector Machine Approaches , 2007, Third International Conference on Natural Computation (ICNC 2007).

[59]  Ahmed E. Hassan,et al.  Should I contribute to this discussion? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[60]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[61]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[62]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[63]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[64]  M. Kholief,et al.  Bug fix-time prediction model using naïve Bayes classifier , 2012, 2012 22nd International Conference on Computer Theory and Applications (ICCTA).

[65]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[66]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[67]  Moe Thandar Wynn,et al.  Profiling Event Logs to Configure Risk Indicators for Process Delays , 2013, CAiSE.

[68]  Ming Wen,et al.  An empirical study of bug report field reassignment , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[69]  Bijan Elahi Software Risk Management , 2018 .

[70]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[71]  Meiyappan Nagappan,et al.  Characterizing and predicting blocking bugs in open source projects , 2018, J. Syst. Softw..

[72]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[73]  Uirá Kulesza,et al.  An Empirical Study of Delays in the Integration of Addressed Issues , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[74]  Sun-Jen Huang,et al.  An empirical analysis of risk components and performance on software projects , 2007, J. Syst. Softw..

[75]  Jingsha He,et al.  A recommendation system for a web portal , 2014, 2014 IEEE International Conference on Progress in Informatics and Computing.

[76]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[77]  Dan Roth,et al.  Understanding Probabilistic Classifiers , 2001, ECML.

[78]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).