iFixFlakies: a framework for automatically fixing order-dependent flaky tests

Regression testing provides important pass or fail signals that developers use to make decisions after code changes. However, flaky tests, which pass or fail even when the code has not changed, can mislead developers. A common kind of flaky tests are order-dependent tests, which pass or fail depending on the order in which the tests are run. Fixing order-dependent tests is often tedious and time-consuming. We propose iFixFlakies, a framework for automatically fixing order-dependent tests. The key insight in iFixFlakies is that test suites often already have tests, which we call helpers, whose logic resets or sets the states for order-dependent tests to pass. iFixFlakies searches a test suite for helpers that make the order-dependent tests pass and then recommends patches for the order-dependent tests using code from these helpers. Our evaluation on 110 truly orderdependent tests from a public dataset shows that 58 of them have helpers, and iFixFlakies can fix all 58. We opened pull requests for 56 order-dependent tests (2 of 58 were already fixed), and developers have already accepted pull requests for 21 of them, with all the remaining ones still pending.

[1]  Fan Long,et al.  Automatic patch generation by learning correct code , 2016, POPL.

[2]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[3]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[4]  Fan Long,et al.  Automatic inference of code transforms for patch generation , 2017, ESEC/SIGSOFT FSE.

[5]  Claire Le Goues,et al.  A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[6]  Ricardo Terra,et al.  Recommending automated extract method refactorings , 2014, ICPC 2014.

[7]  Andreas Zeller,et al.  Practical Test Dependency Detection , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[8]  Kivanç Muslu,et al.  Finding bugs by isolating unit tests , 2011, ESEC/FSE '11.

[9]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[10]  Darko Marinov,et al.  Reliable testing: detecting state-polluting tests to prevent test dependency , 2015, ISSTA.

[11]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[12]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[13]  Darko Marinov,et al.  ReAssert: Suggesting Repairs for Broken Unit Tests , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[14]  Chen Huo,et al.  Improving oracle quality by detecting brittle assertions and unused inputs in tests , 2014, FSE 2014.

[15]  Martin Monperrus,et al.  Automatic Software Repair , 2018, ACM Comput. Surv..

[16]  Mauro Pezzè,et al.  Supporting Test Suite Evolution through Test Case Adaptation , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[17]  Peter W. O'Hearn,et al.  From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[18]  Michael D. Ernst,et al.  Empirically revisiting the test independence assumption , 2014, ISSTA 2014.

[19]  Gail E. Kaiser,et al.  Unit test virtualization with VMVM , 2014, ICSE.

[20]  A. Jefferson Offutt,et al.  Mutation analysis using mutant schemata , 1993, ISSTA '93.

[21]  Gail E. Kaiser,et al.  Efficient dependency detection for safe Java test acceleration , 2015, ESEC/SIGSOFT FSE.

[22]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[24]  Sarfraz Khurshid,et al.  Specification-Based Test Repair Using a Lightweight Formal Method , 2012, FM.

[25]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[26]  Andy Zaidman,et al.  Does Refactoring of Test Smells Induce Fixing Flaky Tests? , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[27]  Zhenyu Chen,et al.  SITAR: GUI Test Script Repair , 2016, IEEE Transactions on Software Engineering.

[28]  Zebao Gao,et al.  Making System User Interactive Tests Repeatable: When and What Should we Control? , 2015, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[29]  Mark Harman,et al.  Automated software transplantation , 2015, ISSTA.

[30]  Nachiappan Nagappan,et al.  Empirically Detecting False Test Alarms Using Association Rules , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[31]  Saurabh Sinha,et al.  Robust test automation using contextual clues , 2014, ISSTA 2014.

[32]  Darko Marinov,et al.  DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[33]  Alex Groce,et al.  Cause Reduction for Quick Testing , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[34]  Wing Lam,et al.  iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[35]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[36]  Ali Mesbah,et al.  Visual web test repair , 2018, ESEC/SIGSOFT FSE.

[37]  Celal Ziftci,et al.  Who Broke the Build? Automatically Identifying Changes That Induce Test Failures in Continuous Integration at Google Scale , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[38]  Xia Li,et al.  Transforming programs and tests in tandem for fault localization , 2017, Proc. ACM Program. Lang..

[39]  Saurabh Sinha,et al.  Efficient and change-resilient test automation: An industrial case study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[40]  Alexander Chatzigeorgiou,et al.  Identification of Move Method Refactoring Opportunities , 2009, IEEE Transactions on Software Engineering.

[41]  Darko Marinov,et al.  On test repair using symbolic execution , 2010, ISSTA '10.

[42]  Lingming Zhang,et al.  Practical program repair via bytecode mutation , 2018, ISSTA.