A Large-Scale Study of Test Coverage Evolution

Statement coverage is commonly used as a measure of test suite quality. Coverage is often used as a part of a code review process: if a patch decreases overall coverage, or is itself not covered, then the patch is scrutinized more closely. Traditional studies of how coverage changes with code evolution have examined the overall coverage of the entire program, and more recent work directly examines the coverage of patches (changed statements). We present an evaluation much larger than prior studies and moreover consider a new, important kind of change – coverage changes of unchanged statements. We present a large-scale evaluation of code coverage evolution over 7,816 builds of 47 projects written in popular languages including Java, Python, and Scala. We find that in large, mature projects, simply measuring the change to statement coverage does not capture the nuances of code evolution. Going beyond considering statement coverage as a simple ratio, we examine how the set of statements covered evolves between project revisions. We present and study new ways to assess the impact of a patch on a project's test suite quality that both separates coverage of the patch from coverage of the non-patch, and separates changes in coverage from changes in the set of statements covered.

[1]  Alex Groce,et al.  Can testedness be effectively measured? , 2016, SIGSOFT FSE.

[2]  Cristian Cadar,et al.  Covrig: a framework for the analysis of code, test, and coverage evolution in real software , 2014, ISSTA 2014.

[3]  Darko Marinov,et al.  Practical regression test selection with dynamic file dependencies , 2015, ISSTA.

[4]  Georgios Gousios,et al.  Oops, my tests broke the build: An analysis of Travis CI builds with GitHub , 2016, PeerJ Prepr..

[5]  Steven P. Reiss,et al.  Tracking source locations , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[6]  Alessandro Orso,et al.  Understanding myths and realities of test-suite evolution , 2012, SIGSOFT FSE.

[7]  Harald C. Gall,et al.  An Empirical Analysis of the Docker Container Ecosystem on GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[8]  Christian Kästner,et al.  Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[9]  Gail E. Kaiser,et al.  Phosphor: illuminating dynamic data flow in commodity jvms , 2014, OOPSLA.

[10]  Georgios Gousios,et al.  Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[11]  Bruno Leonardo Barros Silva,et al.  Sentiment Analysis of Travis CI Builds , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[12]  Claire Le Goues,et al.  Analyzing the Impact of Social Attributes on Commit Integration Success , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[13]  Arie van Deursen,et al.  Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining , 2008, Empirical Software Engineering.

[14]  Darko Marinov,et al.  DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[15]  Zebao Gao,et al.  Making System User Interactive Tests Repeatable: When and What Should we Control? , 2015, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[16]  Lu Zhang,et al.  How Do Assertions Impact Coverage-Based Test-Suite Reduction? , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[17]  Arie van Deursen,et al.  Visualizing code and coverage changes for code review , 2016, SIGSOFT FSE.

[18]  Edi Shmueli,et al.  Screening heuristics for project gating systems , 2017, ESEC/SIGSOFT FSE.

[19]  Marcelo de Almeida Maia,et al.  On the Interplay between Non-Functional Requirements and Builds on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[20]  Gregg Rothermel,et al.  The impact of software evolution on code coverage information , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[21]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[22]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[23]  Darko Marinov,et al.  Ekstazi: Lightweight Test Selection , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[24]  Yuming Zhou,et al.  The impact of continuous integration on other software development practices: A large-scale empirical study , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Rui Abreu,et al.  Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators , 2013, ISSTA.

[26]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[28]  Audris Mockus,et al.  Test coverage and post-verification defects: A multiple case study , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[29]  David Lo,et al.  Code Coverage and Postrelease Defects: A Large-Scale Study on Open Source Projects , 2017, IEEE Transactions on Reliability.

[30]  Georgios Gousios,et al.  TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[31]  Darko Marinov,et al.  Comparing and combining test-suite reduction and regression test selection , 2015, ESEC/SIGSOFT FSE.

[32]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .