Mining Software Defects: Should We Consider Affected Releases?

With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such the heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we find that that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. Surprisingly, we find that the heuristic approach has a minimal impact on defect count models, suggesting that future work should not be too concerned about defect count models that are constructed using heuristic defect datasets. On the other hand, using defect datasets generated by the realistic approach lead to an improvement in the predictive accuracy of defect classification models.

[1]  Shane McIntosh,et al.  Comments on "Researcher Bias: The Use of Machine Learning in Software Defect Prediction" , 2016, IEEE Trans. Software Eng..

[2]  Ahmed E. Hassan,et al.  An Experience Report on Defect Modelling in Practice: Pitfalls and Challenges , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[3]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[4]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[5]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[6]  Ahmed E. Hassan,et al.  Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project , 2010, ESEM '10.

[7]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[8]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[9]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[10]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[11]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[12]  Christoph Treude,et al.  AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  Hajimu Iida,et al.  Revisiting Code Ownership and Its Relationship with Software Quality in the Scope of Modern Code Review , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[14]  Mark Harman,et al.  Exact Mean Absolute Error of Baseline Predictor, MARP0 , 2016, Inf. Softw. Technol..

[15]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[16]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[17]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[18]  Andreas Zeller,et al.  Change Bursts as Defect Predictors , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[19]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[20]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[23]  Federica Sarro,et al.  Linear Programming as a Baseline for Software Effort Estimation , 2018, ACM Trans. Softw. Eng. Methodol..

[24]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[25]  Foutse Khomh,et al.  An exploratory study of the impact of antipatterns on class change- and fault-proneness , 2011, Empirical Software Engineering.

[26]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[27]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[28]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[29]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[30]  Ahmed E. Hassan,et al.  A Case Study of Bias in Bug-Fix Datasets , 2010, 2010 17th Working Conference on Reverse Engineering.

[31]  Fan Wu,et al.  Mutation-aware fault prediction , 2016, ISSTA.

[32]  Ahmed E. Hassan,et al.  The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[33]  Tim Menzies,et al.  Software Analytics: So What? , 2013, IEEE Softw..

[34]  Michele Marchesi,et al.  The JIRA Repository Dataset: Understanding Social Aspects of Software Development , 2015, PROMISE.

[35]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[36]  Shane McIntosh,et al.  The Impact of Automated Parameter Optimization on Defect Prediction Models , 2018, IEEE Transactions on Software Engineering.

[37]  Ken-ichi Matsumoto,et al.  Comments on “Researcher Bias: The Use of Machine Learning in Software Defect Prediction” , 2016, IEEE Transactions on Software Engineering.

[38]  David M. W. Powers,et al.  What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes , 2015, ArXiv.

[39]  Nachiappan Nagappan,et al.  Predicting Subsystem Failures using Dependency Graph Complexities , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[40]  Gabriele Bavota,et al.  Mining Version Histories for Detecting Code Smells , 2015, IEEE Transactions on Software Engineering.

[41]  Harald C. Gall,et al.  Continuous Code Quality: Are We (Really) Doing That? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  Ahmed E. Hassan,et al.  The Impact of Correlated Metrics on Defect Models , 2018, ArXiv.

[43]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[44]  Burak Turhan,et al.  A Systematic Literature Review and Meta-Analysis on Cross Project Defect Prediction , 2019, IEEE Transactions on Software Engineering.

[45]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[47]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[48]  Hajimu Iida,et al.  Review participation in modern code review , 2017, Empirical Software Engineering.

[49]  Foutse Khomh,et al.  An Exploratory Study of the Impact of Code Smells on Software Change-proneness , 2009, 2009 16th Working Conference on Reverse Engineering.

[50]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[51]  Premkumar T. Devanbu,et al.  Ownership, experience and defects: a fine-grained study of authorship , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[52]  Chakkrit Tantithamthavorn Towards a Better Understanding of the Impact of Experimental Components on Defect Prediction Modelling , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[53]  Tim Menzies,et al.  Heterogeneous Defect Prediction , 2018, IEEE Trans. Software Eng..

[54]  Michele Marchesi,et al.  The Emotional Side of Software Developers in JIRA , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[55]  Ken-ichi Matsumoto,et al.  The Impact of Mislabelling on the Performance and Interpretation of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[56]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[57]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[58]  Martin Shepperd,et al.  Data Sets and Data Quality in Software Engineering: Eight Years On , 2016, PROMISE.

[59]  Anh Tuan Nguyen,et al.  Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[60]  Kenichi Matsumoto,et al.  The impact of human factors on the participation decision of reviewers in modern code review , 2018, Empirical Software Engineering.

[61]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[62]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[63]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[64]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[65]  Audris Mockus,et al.  A large-scale empirical study of just-in-time quality assurance , 2013, IEEE Transactions on Software Engineering.

[66]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[67]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[68]  Harald C. Gall,et al.  Cross-project Defect Prediction , 2009 .

[69]  Victor R. Basili,et al.  A validation of object oriented metrics as quality indicators , 1996 .

[70]  Uirá Kulesza,et al.  A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes , 2017, IEEE Transactions on Software Engineering.

[71]  Bora Caglayan,et al.  Merits of Organizational Metrics in Defect Prediction: An Industrial Replication , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[72]  Shane McIntosh,et al.  The impact of code review coverage and code review participation on software quality: a case study of the qt, VTK, and ITK projects , 2014, MSR 2014.

[73]  Fabio Palomba,et al.  Mining File Histories: Should We Consider Branches? , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[74]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..