Reproducibility and credibility in empirical software engineering: A case study based on a systematic literature review of the use of the SZZ algorithm

Abstract Context Reproducibility of Empirical Software Engineering (ESE) studies is an essential part for improving their credibility, as it offers the opportunity to the research community to verify, evaluate and improve their research outcomes. Objective We aim to study reproducibility and credibility in ESE with a case study, by investigating how they have been addressed in studies where SZZ, a widely-used algorithm by Śliwerski, Zimmermann and Zeller to detect the origin of a bug, has been applied. Methodology We have performed a systematic literature review to evaluate publications that use SZZ. In total, 187 papers have been analyzed for reproducibility, reporting of limitations and use of improved versions of the algorithm. Results We have found a situation with a lot of room for improvement in ESE as reproducibility is not commonly found; factors that undermine the credibility of results are common. We offer some lessons learned and guidelines for researchers and reviewers to address this problem. Conclusion Reproducibility and other related aspects that ensure a high quality scientific process should be taken more into consideration by the ESE community in order to increase the credibility of the research results.

[1]  B. Flyvbjerg Five Misunderstandings About Case-Study Research , 2006, 1304.1186.

[2]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[3]  Shane McIntosh,et al.  An empirical study of the impact of modern code review practices on software quality , 2015, Empirical Software Engineering.

[4]  Lech Madeyski,et al.  A review of process metrics in defect prediction studies , 2011 .

[5]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[6]  Jesús M. González-Barahona,et al.  BugTracking: A Tool to Assist in the Identification of Bug Reports , 2016, OSS.

[7]  A. Kuper,et al.  The Social Science Encyclopedia , 1986 .

[8]  Daniel M. Germán,et al.  Change impact graphs: Determining the impact of prior codechanges , 2009, Inf. Softw. Technol..

[9]  Dror G. Feitelson,et al.  The Linux kernel as a case study in software evolution , 2010, J. Syst. Softw..

[10]  Rongxin Wu,et al.  ReLink: recovering links between bugs and changes , 2011, ESEC/FSE '11.

[11]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[12]  Gregorio Robles,et al.  Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[13]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[14]  Uirá Kulesza,et al.  A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes , 2017, IEEE Transactions on Software Engineering.

[15]  R. Rosenthal,et al.  Meta-analysis: recent developments in quantitative methods for literature reviews. , 2001, Annual review of psychology.

[16]  Jonathan I. Maletic,et al.  A survey and taxonomy of approaches for mining software repositories in the context of software evolution , 2007, J. Softw. Maintenance Res. Pract..

[17]  Yan Lindsay Sun,et al.  FRLink: Improving the recovery of missing issue-commit links by revisiting file relevance , 2017, Inf. Softw. Technol..

[18]  Chisu Wu,et al.  A Survey on Mining Software Repositories , 2012, IEICE Trans. Inf. Syst..

[19]  Natalia Juristo Juzgado,et al.  Using differences among replications of software engineering experiments to gain knowledge , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[20]  Forrest Shull,et al.  Building Knowledge through Families of Experiments , 1999, IEEE Trans. Software Eng..

[21]  Anh Tuan Nguyen,et al.  Multi-layered approach for recovering links between bug reports and fixes , 2012, SIGSOFT FSE.

[22]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[23]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2012, Springer Berlin Heidelberg.

[24]  Premkumar T. Devanbu,et al.  Clones: what is that smell? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[25]  Daniela Cruzes,et al.  Research synthesis in software engineering: A tertiary study , 2011, Inf. Softw. Technol..

[26]  Gail C. Murphy,et al.  Hipikat: recommending pertinent software development artifacts , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[27]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[28]  Stephen H. Kan,et al.  Metrics and Models in Software Quality Engineering , 1994, SOEN.

[29]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[30]  Harald C. Gall,et al.  Software Mining Studies: Goals, Approaches, Artifacts, and Replicability , 2013, LASER Summer School.

[31]  Marc Roper,et al.  Comparing text‐based and dependence‐based approaches for determining the origins of bugs , 2014, J. Softw. Evol. Process..

[32]  David Lo,et al.  RCLinker: Automated Linking of Issue Reports and Commits Leveraging Rich Contextual Information , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[33]  Hong Mei,et al.  A survey on bug-report analysis , 2015, Science China Information Sciences.

[34]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[35]  David Lo,et al.  Empirical Evaluation of Bug Linking , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[36]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[37]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[38]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[39]  Abraham Bernstein,et al.  Software process data quality and characteristics: a historical view on open and closed source projects , 2009, IWPSE-Evol '09.

[40]  Barbara A. Kitchenham,et al.  Would wider adoption of reproducible research be beneficial for empirical software engineering research? , 2017, J. Intell. Fuzzy Syst..

[41]  Jesús M. González-Barahona,et al.  On the reproducibility of empirical software engineering studies based on data retrieved from development repositories , 2011, Empirical Software Engineering.

[42]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[43]  Jaime Spacco,et al.  SZZ revisited: verifying when changes induce fixes , 2008, DEFECTS '08.