It's not a bug, it's a feature: How misclassification impacts bug prediction

In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all bug reports to be misclassified - that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We discuss the impact of this misclassification on earlier studies and recommend manual data validation for future studies.

[1]  Gail C. Murphy,et al.  Hipikat: recommending pertinent software development artifacts , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[2]  Marsha Chechik,et al.  Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds , 2008 .

[3]  Audris Mockus,et al.  Missing Data in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[4]  Hui Zeng,et al.  Estimation of software defects fix effort using neural networks , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[5]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[6]  Abraham Bernstein,et al.  LINKSTER: enabling efficient manual inspection and annotation of mined data , 2010, FSE '10.

[7]  Rahul Premraj,et al.  Network Versus Code Metrics to Predict Defects: A Replication Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[8]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[9]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[10]  André van der Hoek,et al.  Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering , 2010, FSE 2010.

[11]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[12]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[13]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[14]  Thomas Zimmermann,et al.  When do changes induce fixes? On Fridays , 2005 .

[15]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[16]  Ahmed E. Hassan,et al.  A Case Study of Bias in Bug-Fix Datasets , 2010, 2010 17th Working Conference on Reverse Engineering.

[17]  Philip J. Guo,et al.  "Not my bug!" and other reasons for software bug report reassignments , 2011, CSCW.

[18]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[19]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[20]  Andreas Zeller,et al.  Predicting component failures at design time , 2006, ISESE '06.

[21]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[22]  Martin Shepperd,et al.  Data Sets and Data Quality in Software Engineering: Eight Years On , 2016, PROMISE.

[23]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[24]  Westley Weimer,et al.  Modeling bug report quality , 2007, ASE '07.

[25]  Janice Singer,et al.  Guide to Advanced Empirical Software Engineering , 2007 .

[26]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[27]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[28]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[29]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[30]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[31]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[32]  Claes Wohlin,et al.  Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering , 2006 .

[33]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[34]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[35]  Gregorio Robles,et al.  Effort estimation by characterizing developer activity , 2006, EDSER '06.

[36]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[37]  Alexander Egyed,et al.  Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering , 2007, ASE 2007.

[38]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[39]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[40]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[41]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[42]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.