Characterizing and predicting blocking bugs in open source projects

As software becomes increasingly important, its quality becomes an increasingly important issue. Therefore, prior work focused on software quality and proposed many prediction models to identify the location of software bugs, to estimate their fixing-time, etc. However, one special type of severe bugs is blocking bugs. Blocking bugs are software bugs that prevent other bugs from being fixed. These blocking bugs may increase maintenance costs, reduce overall quality and delay the release of the software systems. In this paper, we study blocking-bugs in six open source projects and propose a model to predict them. Our goal is to help developers identify these blocking bugs early on. We collect the bug reports from the bug tracking systems of the projects, then we obtain 14 different factors related to, for example, the textual description of the bug, the location the bug is found in and the people involved with the bug. Based on these factors we build decision trees for each project to predict whether a bug will be a blocking bug or not. Then, we analyze these decision trees in order to determine which factors best indicate these blocking bugs. Our results show that our prediction models achieve F-measures of 15-42%, which is a two- to four-fold improvement over the baseline random predictors. We also find that the most important factors in determining blocking bugs are the comment text, comment size, the number of developers in the CC list of the bug report and the reporter's experience. Our analysis shows that our models reduce the median time to identify a blocking bug by 3-18 days.

[1]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[2]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[3]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[4]  Thomas Zimmermann,et al.  Optimized assignment of developers for fixing bugs an initial evaluation for eclipse projects , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[5]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[8]  Anh Tuan Nguyen,et al.  Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[9]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[10]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[11]  Gregory Tassey,et al.  Prepared for what , 2007 .

[12]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[13]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[14]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[15]  Yuanyuan Zhou,et al.  Bug characteristics in open source software , 2013, Empirical Software Engineering.

[16]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[17]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[18]  Philip J. Guo,et al.  Characterizing and predicting which bugs get reopened , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19]  Premkumar T. Devanbu,et al.  Sample size vs. bias in defect prediction , 2013, ESEC/FSE 2013.

[20]  Tim Menzies,et al.  Automated severity assessment of software defect reports , 2008, 2008 IEEE International Conference on Software Maintenance.

[21]  Thomas Zimmermann,et al.  Duplicate bug reports considered harmful … really? , 2008, 2008 IEEE International Conference on Software Maintenance.

[22]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[23]  Ken-ichi Matsumoto,et al.  Studying re-opened bugs in open source software , 2012, Empirical Software Engineering.

[24]  Harald C. Gall,et al.  Comparing fine-grained source code changes and code churn for bug prediction , 2011, MSR '11.

[25]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[26]  Siau-Cheng Khoo,et al.  Towards more accurate retrieval of duplicate bug reports , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[27]  MockusAudris,et al.  A Large-Scale Empirical Study of Just-in-Time Quality Assurance , 2013 .

[28]  He Jiang,et al.  Towards Training Set Reduction for Bug Triage , 2011, 2011 IEEE 35th Annual Computer Software and Applications Conference.

[29]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[30]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[31]  Yuming Zhou,et al.  Empirical analysis of network measures for effort-aware fault-proneness prediction , 2016, Inf. Softw. Technol..

[32]  David Lo,et al.  A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction , 2013, CSMR 2013.

[33]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[34]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[35]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[36]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[37]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[38]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[39]  Ying Zou,et al.  Studying the fix-time for bugs in large open source projects , 2011, Promise '11.

[40]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[41]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[42]  Gustavo E. A. P. A. Batista,et al.  Learning with Skewed Class Distributions , 2002 .

[43]  Ahmed E. Hassan,et al.  A qualitative study on performance bugs , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[44]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[45]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[46]  Ahmed E. Hassan,et al.  Should I contribute to this discussion? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[47]  Michele Lanza,et al.  On the Relationship Between Change Coupling and Software Defects , 2009, 2009 16th Working Conference on Reverse Engineering.

[48]  Nicholas Jalbert,et al.  Automated duplicate detection for bug tracking systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[49]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[50]  Iulian Neamtiu,et al.  Bug-fix time prediction models: can we do better? , 2011, MSR '11.

[51]  Marsha Chechik,et al.  Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds , 2008 .

[52]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[53]  Serge Demeyer,et al.  Comparing Mining Algorithms for Predicting the Severity of a Reported Bug , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[54]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[55]  Punam Bedi,et al.  Predicting the priority of a reported bug using machine learning techniques and cross project validation , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[58]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.