Comparing fine-grained source code changes and code churn for bug prediction - A retrospective

More than two decades ago, researchers started to mine the data stored in software repositories to help software developers in making informed decisions for developing and testing software systems. Bug prediction was one of the most promising and popular research directions that uses the data stored in software repositories to predict the bug-proneness or number of bugs in source files. On that topic and as part of Emanuel's PhD studies, we submitted a paper with the title Comparing fine-grained source code changes and code churn for bug prediction [8] to the 8th Working Conference on Mining Software Engineering, held 2011 in beautiful Honolulu, Hawaii. Ten years later, it got selected as one of the finalists to receive the MSR 2021 Most Influential Paper Award. In the following, we provide a retrospective on our work, describing the road to publishing this paper, its impact in the field of bug prediction, and the road ahead.

[1]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[2]  Lech Madeyski,et al.  Which process metrics can significantly improve defect prediction models? An empirical study , 2014, Software Quality Journal.

[3]  Alessandro F. Garcia,et al.  Trading robustness for maintainability: an empirical study of evolving c# programs , 2014, ICSE.

[4]  Foutse Khomh,et al.  Analyzing the Impact of Antipatterns on Change-Proneness Using Fine-Grained Source Code Changes , 2012, 2012 19th Working Conference on Reverse Engineering.

[5]  Matias Martinez,et al.  Mining software repair models for reasoning on the search space of automated program fixing , 2013, Empirical Software Engineering.

[6]  Vedran Ljubovic,et al.  Plagiarism Detection in Computer Programming Using Feature Extraction From Ultra-Fine-Grained Repositories , 2020, IEEE Access.

[7]  Bojana Dalbelo Basic,et al.  Stability of Software Defect Prediction in Relation to Levels of Data Imbalance , 2013, SQAMIA.

[8]  Hridesh Rajan,et al.  A study of repetitiveness of code changes in software evolution , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Harald C. Gall,et al.  Change Analysis with Evolizer and ChangeDistiller , 2009, IEEE Software.

[10]  Giovanni Vigna,et al.  SPIDER: Enabling Fast Patch Propagation In Related Software Repositories , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[11]  Zhiyong Feng,et al.  Inferring Patterns for Taint-Style Vulnerabilities With Security Patches , 2019, IEEE Access.

[12]  Anita Sarma,et al.  Planning for Untangling: Predicting the Difficulty of Merge Conflicts , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[13]  Michael W. Godfrey,et al.  The MSR Cookbook: Mining a decade of research , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[14]  Harald C. Gall,et al.  Comparing fine-grained source code changes and code churn for bug prediction , 2011, MSR '11.

[15]  Martin Pinzger,et al.  Method-level bug prediction , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[16]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[17]  Herman Akdag,et al.  Performance and cost-effectiveness of change burst metrics in predicting software faults , 2018, Knowledge and Information Systems.

[18]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[19]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.