De-Flake Your Tests : Automatically Locating Root Causes of Flaky Tests in Code At Google

Regression testing is a critical part of software development and maintenance. It ensures that modifications to existing software do not break existing behavior and functionality.One of the key assumptions about regression tests is that their results are deterministic: when executed without any modifications with the same configuration, either they always fail or they always pass. In practice, however, there exist tests that are non-deterministic, called flaky tests. Flaky tests cause the results of test runs to be unreliable, and they disrupt the software development workflow.In this paper, we present a novel technique to automatically identify the locations of the root causes of flaky tests on the code level to help developers debug and fix them. We study the technique on flaky tests across 428 projects at Google. Based on our case studies, the technique helps identify the location of the root causes of flakiness with 82% accuracy. Furthermore, our studies show that integration into the appropriate developer workflows, simplicity of debugging aides and fully automated fixes are crucial and preferred components for adoption and usability of flakiness debugging and fixing tools.

[1]  Celal Ziftci,et al.  Who Broke the Build? Automatically Identifying Changes That Induce Test Failures in Continuous Integration at Google Scale , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[2]  Gail E. Kaiser,et al.  Unit test virtualization with VMVM , 2014, ICSE.

[3]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[4]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[6]  Kivanç Muslu,et al.  Finding bugs by isolating unit tests , 2011, ESEC/FSE '11.

[7]  Michael D. Ernst,et al.  Empirically revisiting the test independence assumption , 2014, ISSTA 2014.

[8]  Brendan Murphy,et al.  The Art of Testing Less without Sacrificing Quality , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[9]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[10]  Darko Marinov,et al.  Reliable testing: detecting state-polluting tests to prevent test dependency , 2015, ISSTA.

[11]  Josh Levenberg,et al.  Why Google stores billions of lines of code in a single repository , 2016, Commun. ACM.

[12]  Andy Zaidman,et al.  Does Refactoring of Test Smells Induce Fixing Flaky Tests? , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  Laurie A. Williams,et al.  Continuous Deployment at Facebook and OANDA , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[14]  Francis J. Lacoste Killing the Gatekeeper: Introducing a Continuous Integration System , 2009, 2009 Agile Conference.

[15]  Lars Grunske,et al.  A Critical Evaluation of Spectrum-Based Fault Localization Techniques on a Large-Scale Software System , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[16]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[17]  Xiaochen Li,et al.  What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[18]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[19]  Eitan Farchi,et al.  Concurrent bug patterns and how to test them , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[20]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.

[21]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[22]  Peter W. O'Hearn,et al.  From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[23]  John Micco,et al.  The State of Continuous Integration Testing @Google , 2017 .

[24]  Myra B. Cohen,et al.  Automated testing of GUI applications: Models, tools, and controlling flakiness , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[26]  Nachiappan Nagappan,et al.  Empirically Detecting False Test Alarms Using Association Rules , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[27]  Tao Xie,et al.  iFixFlakies: a framework for automatically fixing order-dependent flaky tests , 2019, ESEC/SIGSOFT FSE.

[28]  Andreas Zeller,et al.  Practical Test Dependency Detection , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[29]  Darko Marinov,et al.  DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[30]  Luke Church,et al.  Modern Code Review: A Case Study at Google , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[31]  Wing Lam,et al.  iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[32]  Zebao Gao,et al.  Making System User Interactive Tests Repeatable: When and What Should we Control? , 2015, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).