Understanding and Improving Regression Test Selection in Continuous Integration

Developers rely on regression testing in their continuous integration (CI) environment to find changes that introduce regression faults. While regression testing is widely practiced, it can be costly. Regression test selection (RTS) reduces the cost of regression testing by not running the tests that are unaffected by the changes. Industry has adopted module-level RTS for their CI environment, while researchers have proposed class-level RTS. In this paper, we compare module-and class-level RTS techniques in a cloud-based CI environment, Travis. We also develop and evaluate a hybrid RTS technique that combines aspects of the module-and class-level RTS techniques. We evaluate all the techniques on real Travis builds. We find that the RTS techniques do save testing time compared to running all tests (RetestAll), but the percentage of time for a full build using RTS (76.0%) is not as low as found in previous work, due to the extra overhead in a cloud-based CI environment. Moreover, we inspect test failures from RetestAll builds, and although we find that RTS techniques can miss to select failed tests, these test failures are almost all flaky test failures. As such, RTS techniques provide additional value in helping developers avoid wasting time debugging failures not related to the recent code changes. Overall, our results show that RTS can be beneficial for the developers in the CI environment, and RTS not only saves time but also avoids misleading developers by flaky test failures.

[1]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[2]  Alessandro Orso,et al.  Regression test selection for Java software , 2001, OOPSLA '01.

[3]  Georgios Gousios,et al.  TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[4]  Alessandro Orso,et al.  Scaling regression testing to large software systems , 2004, SIGSOFT '04/FSE-12.

[5]  Brendan Murphy,et al.  The Art of Testing Less without Sacrificing Quality , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[6]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[7]  Atanas Rountev,et al.  Regression Test Selection for AspectJ Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[8]  Yuming Zhou,et al.  The impact of continuous integration on other software development practices: A large-scale empirical study , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[10]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[11]  Sarfraz Khurshid,et al.  FaultTracer: a change impact and regression fault analysis tool for evolving Java programs , 2012, SIGSOFT FSE.

[12]  Srikanth Kandula,et al.  CloudBuild: Microsoft's Distributed and Caching Build Service , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[13]  Ting Wang,et al.  A Study of Regression Test Selection in Continuous Integration Environments , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[15]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[16]  Darko Marinov,et al.  Practical regression test selection with dynamic file dependencies , 2015, ISSTA.

[17]  Darko Marinov,et al.  Evaluating Regression Test Selection Opportunities in a Very Large Open-Source Ecosystem , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[18]  Darko Marinov,et al.  Ekstazi: Lightweight Test Selection , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[19]  Lingming Zhang,et al.  Hybrid Regression Test Selection , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[20]  Gregg Rothermel,et al.  A safe, efficient regression test selection technique , 1997, TSEM.

[21]  Milos Gligoric,et al.  File-level vs. module-level regression test selection for .NET , 2017, ESEC/SIGSOFT FSE.

[22]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[23]  Darko Marinov,et al.  DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[24]  Peter W. O'Hearn,et al.  From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[25]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[27]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.

[28]  Nikolaj Bjørner,et al.  Optimizing Test Placement for Module-Level Regression Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[29]  Jonathan I. Maletic,et al.  What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[30]  Andy Zaidman,et al.  Does Refactoring of Test Smells Induce Fixing Flaky Tests? , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[31]  Darko Marinov,et al.  An extensive study of static regression test selection in modern software evolution , 2016, SIGSOFT FSE.

[32]  Danny Dig,et al.  How do centralized and distributed version control systems impact software changes? , 2014, ICSE.

[33]  Marco Tulio Valente,et al.  Predicting the Popularity of GitHub Repositories , 2016, PROMISE.

[34]  S. Elbaum,et al.  The impact of test suite granularity on the cost-effectiveness of regression testing , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.