Code Coverage and Postrelease Defects: A Large-Scale Study on Open Source Projects

Testing is a pivotal activity in ensuring the quality of software. Code coverage is a common metric used as a yardstick to measure the efficacy and adequacy of testing. However, does higher coverage actually lead to a decline in postrelease bugs? Do files that have higher test coverage actually have fewer bug reports? The direct relationship between code coverage and actual bug reports has not yet been analyzed via a comprehensive empirical study on real bugs. Past studies only involve a few software systems or artificially injected bugs (mutants). In this empirical study, we examine these questions in the context of open-source software projects based on their actual reported bugs. We analyze 100 large open-source Java projects and measure the code coverage of the test cases that come along with these projects. We collect real bugs logged in the issue tracking system after the release of the software and analyze the correlations between code coverage and these bugs. We also collect other metrics such as cyclomatic complexity and lines of code, which are used to normalize the number of bugs and coverage to correlate with other metrics as well as use these metrics in regression analysis. Our results show that coverage has an insignificant correlation with the number of bugs that are found after the release of the software at the project level, and no such correlation at the file level.

[1]  T J. Mccabe,et al.  Structured Testing: A Software Testing Methodology Using the Cyclomatic Complexity Metric , 1982 .

[2]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[3]  Chris F. Kemerer,et al.  Cyclomatic Complexity Density and Software Maintenance Productivity , 1991, IEEE Trans. Software Eng..

[4]  Foutse Khomh,et al.  An empirical study of the effect of file editing patterns on software quality , 2014, J. Softw. Evol. Process..

[5]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[6]  David Lo,et al.  Detecting similar repositories on GitHub , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[7]  Georgios Gousios,et al.  Open Source Software: A Survey from 10, 000 Feet , 2011, Found. Trends Technol. Inf. Oper. Manag..

[8]  S. L. Gerhart,et al.  Toward a theory of test data selection , 1975, IEEE Transactions on Software Engineering.

[9]  Alex Groce,et al.  Can testedness be effectively measured? , 2016, SIGSOFT FSE.

[10]  David Lo,et al.  Cataloging GitHub Repositories , 2017, EASE.

[11]  Premkumar T. Devanbu,et al.  Assert Use in GitHub Projects , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[12]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[13]  Diomidis Spinellis,et al.  Code Quality: The Open Source Perspective , 2006 .

[14]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[15]  David Lo,et al.  Revisiting Assert Use in GitHub Projects , 2017, EASE.

[16]  David Lo,et al.  Code coverage and test suite effectiveness: Empirical study with real bugs in large systems , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[17]  Abraham Bernstein,et al.  When process data quality affects the number of bugs: Correlations in software engineering datasets , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[18]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[19]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[20]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[21]  Audris Mockus,et al.  Test coverage and post-verification defects: A multiple case study , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[22]  Premkumar T. Devanbu,et al.  Gender and Tenure Diversity in GitHub Teams , 2015, CHI.

[23]  Akbar Siami Namin,et al.  The use of mutation in testing experiments and its sensitivity to external threats , 2011, ISSTA '11.

[24]  David Lo,et al.  An Empirical Study on the Adequacy of Testing in Open Source Projects , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[25]  David Lo,et al.  Why and how developers fork what from whom in GitHub , 2017, Empirical Software Engineering.

[26]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[27]  David Lo,et al.  A Large Scale Study of Multiple Programming Languages and Code Quality , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[28]  Will G. Hopkins,et al.  A new view of statistics , 2002 .

[29]  Xia Cai,et al.  Coverage-based testing strategies and reliability modeling for fault-tolerant software systems , 2006 .

[30]  David Lo,et al.  Adoption of Software Testing in Open Source Projects--A Preliminary Study on 50,000 Projects , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[31]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[32]  David Leon,et al.  A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[33]  David Lo,et al.  An Empirical Study of Adoption of Software Testing in Open Source Projects , 2013, 2013 13th International Conference on Quality Software.

[34]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[35]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, A-MOST.

[36]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[37]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[38]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[39]  Alex Groce,et al.  Mutations: How Close are they to Real Faults? , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[40]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[41]  Norman E. Fenton,et al.  Software metrics: roadmap , 2000, ICSE '00.