Evaluating test-suite reduction in real software evolution

Test-suite reduction (TSR) speeds up regression testing by removing redundant tests from the test suite, thus running fewer tests in the future builds. To decide whether to use TSR or not, a developer needs some way to predict how well the reduced test suite will detect real faults in the future compared to the original test suite. Prior research evaluated the cost of TSR using only program versions with seeded faults, but such evaluations do not explicitly predict the effectiveness of the reduced test suite in future builds. We perform the first extensive study of TSR using real test failures in (failed) builds that occurred for real code changes. We analyze 1478 failed builds from 32 GitHub projects that run their tests on Travis. Each failed build can have multiple faults, so we propose a family of mappings from test failures to faults. We use these mappings to compute Failed-Build Detection Loss (FBDL), the percentage of failed builds where the reduced test suite misses to detect all the faults detected by the original test suite. We find that FBDL can be up to 52.2%, which is higher than suggested by traditional TSR metrics. Moreover, traditional TSR metrics are not good predictors of FBDL, making it difficult for developers to decide whether to use reduced test suites.

[1]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[2]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2012, IEEE Trans. Software Eng..

[3]  Marco Tulio Valente,et al.  Predicting the Popularity of GitHub Repositories , 2016, PROMISE.

[4]  Mary Jean Harrold,et al.  Test-Suite Reduction and Prioritization for Modified Condition/Decision Coverage , 2003, IEEE Trans. Software Eng..

[5]  Michael D. Ernst,et al.  When Tests Collide: Evaluating and Coping with the Impact of Test Dependence , 2015 .

[6]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[7]  Rohit Gheyi,et al.  Analyzing Refactorings on Software Repositories , 2011, 2011 25th Brazilian Symposium on Software Engineering.

[8]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[9]  Gregg Rothermel,et al.  Empirical studies of test‐suite reduction , 2002, Softw. Test. Verification Reliab..

[10]  Gregg Rothermel,et al.  The impact of test suite granularity on the cost-effectiveness of regression testing , 2002, ICSE '02.

[11]  Lu Zhang,et al.  How Does Regression Test Prioritization Perform in Real-World Software Evolution? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[12]  Sarfraz Khurshid,et al.  An Empirical Study of JUnit Test-Suite Reduction , 2011, 2011 IEEE 22nd International Symposium on Software Reliability Engineering.

[13]  Zheng Li,et al.  Search Based Test Suite Minimization for Fault Detection and Localization: A Co-driven Method , 2016, SSBSE.

[14]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.

[15]  Georgios Gousios,et al.  TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[16]  Amitabh Srivastava,et al.  Effectively prioritizing tests in development environment , 2002, ISSTA '02.

[17]  Jonathan I. Maletic,et al.  What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[18]  Alessandro Orso,et al.  Understanding myths and realities of test-suite evolution , 2012, SIGSOFT FSE.

[19]  Danny Dig,et al.  How do centralized and distributed version control systems impact software changes? , 2014, ICSE.

[20]  George Mason,et al.  Procedures for Reducing the Size of Coverage-based Test Sets , 1995 .

[21]  Mark Harman,et al.  Pareto efficient multi-objective test case selection , 2007, ISSTA '07.

[22]  Tsong Yueh Chen,et al.  A simulation study on some heuristics for test suite reduction , 1998, Inf. Softw. Technol..

[23]  Lu Zhang,et al.  An experimental study of four typical test suite reduction techniques , 2008, Inf. Softw. Technol..

[24]  Gregg Rothermel,et al.  An empirical study of the effects of minimization on the fault detection capabilities of test suites , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[25]  Joseph Robert Horgan,et al.  Effect of Test Set Minimization on Fault Detection Effectiveness , 1995, 1995 17th International Conference on Software Engineering.

[26]  Cheng-qing Ye,et al.  Test-Suite Reduction Using Genetic Algorithm , 2005, APPT.

[27]  Arnaud Gotlieb,et al.  FLOWER: optimal test suite reduction as a network maximum flow , 2014, ISSTA 2014.

[28]  Gail E. Kaiser,et al.  Efficient dependency detection for safe Java test acceleration , 2015, ESEC/SIGSOFT FSE.

[29]  David S. Rosenblum,et al.  Using Coverage Information to Predict the Cost-Effectiveness of Regression Testing Strategies , 1997, IEEE Trans. Software Eng..

[30]  Lu Zhang,et al.  How Do Assertions Impact Coverage-Based Test-Suite Reduction? , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[31]  Tsong Yueh Chen,et al.  A new heuristic for test suite reduction , 1998, Inf. Softw. Technol..

[32]  Emanuel Melachrinoudis,et al.  Bi-criteria models for all-uses test suite reduction , 2004, Proceedings. 26th International Conference on Software Engineering.

[33]  Miryung Kim,et al.  An empirical investigation into the impact of refactoring on regression testing , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[34]  Alex Groce,et al.  Cause Reduction for Quick Testing , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[35]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[36]  Gregg Rothermel,et al.  On-demand test suite reduction , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[37]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[38]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[39]  Darko Marinov,et al.  Reliable testing: detecting state-polluting tests to prevent test dependency , 2015, ISSTA.

[40]  Brendan Murphy,et al.  The Art of Testing Less without Sacrificing Quality , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[41]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[42]  Michael D. Ernst,et al.  Empirically revisiting the test independence assumption , 2014, ISSTA 2014.

[43]  Mike Papadakis,et al.  Automatic Mutation Test Case Generation via Dynamic Symbolic Execution , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[44]  W. Holtzman Fundamental statistics in psychology and education. , 1951 .

[45]  Darko Marinov,et al.  Balancing trade-offs in test-suite reduction , 2014, SIGSOFT FSE.

[46]  David S. Rosenblum,et al.  Predicting the cost-effectiveness of regression testing strategies , 1996, SIGSOFT '96.

[47]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[48]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[49]  Chin-Yu Huang,et al.  Analysis of test suite reduction with enhanced tie-breaking techniques , 2009, Inf. Softw. Technol..

[50]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[51]  Rajiv Gupta,et al.  A methodology for controlling the size of a test suite , 1990, Proceedings. Conference on Software Maintenance 1990.

[52]  T. Y. Chen,et al.  Heuristics Towards The Optimization Of TheSize Of A Test Suite , 1970 .

[53]  Joseph Robert Horgan,et al.  Test set size minimization and fault detection effectiveness: A case study in a space application , 1999, J. Syst. Softw..

[54]  Gregg Rothermel,et al.  Empirical Studies of a Prediction Model for Regression Test Selection , 2001, IEEE Trans. Software Eng..

[55]  Darko Marinov,et al.  Comparing and combining test-suite reduction and regression test selection , 2015, ESEC/SIGSOFT FSE.

[56]  Neelam Gupta,et al.  Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction , 2007, IEEE Transactions on Software Engineering.