A Qualitative Study on the Sources, Impacts, and Mitigation Strategies of Flaky Tests

Test flakiness forms a major testing concern. Flaky tests manifest non-deterministic outcomes that cripple continuous integration and lead developers to investigate false alerts. Industrial reports indicate that on a large scale, the accrual of flaky tests breaks the trust in test suites and entails significant computational cost. To alleviate this, practitioners are constrained to identify flaky tests and investigate their impact. To shed light on such mitigation mechanisms, we interview 14 practitioners with the aim to identify (i) the sources of flakiness within the testing ecosystem, (ii) the impacts of flakiness, (iii) the measures adopted by practitioners when addressing flakiness, and (iv) the automation opportunities for these measures. Our analysis shows that, besides the tests and code, flakiness stems from interactions between the system components, the testing infrastructure, and external factors. We also highlight the impact of flakiness on testing practices and product quality and show that the adoption of guidelines together with a stable infrastructure are key measures in mitigating the problem.

[1]  Judith A. Holton,et al.  Remodeling Grounded Theory , 2004 .

[2]  Darko Marinov,et al.  DeFlaker: Automatically Detecting Flaky Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  G. Rosu,et al.  Why and How JavaScript Developers Use Linters , 2017 .

[4]  Miguel P Caldas,et al.  Research design: qualitative, quantitative, and mixed methods approaches , 2003 .

[5]  Yves Le Traon,et al.  Assessing Transition-Based Test Selection Algorithms at Google , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[6]  Leopoldo Teixeira,et al.  Shake It! Detecting Flaky Tests Caused by Concurrency with Shaker , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[7]  Christian Kästner,et al.  Faster variational execution with transparent bytecode transformation , 2018, Proc. ACM Program. Lang..

[8]  Fabio Palomba,et al.  Understanding flaky tests: the developer’s perspective , 2019, ESEC/SIGSOFT FSE.

[9]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[10]  Na Meng,et al.  An Empirical Study of Flaky Tests in Android Apps , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[11]  Christoph Treude,et al.  What is the Vocabulary of Flaky Tests? , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[12]  Bente Anda,et al.  Experiences from conducting semi-structured interviews in empirical software engineering research , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[13]  Hitesh Sajnani,et al.  A Study on the Lifecycle of Flaky Tests , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[14]  Romain Rouvoy,et al.  On Adopting Linters to Deal with Performance Concerns in Android Apps , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Darko Marinov,et al.  Understanding Reproducibility and Characteristics of Flaky Tests Through Test Reruns in Java Projects , 2020, 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE).

[16]  Wing Lam,et al.  iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[17]  Tariq M. King,et al.  Towards a Bayesian Network Model for Predicting Flaky Automated Tests , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[18]  Philippe Kruchten,et al.  Using grounded theory to study the experience of software development , 2011, Empirical Software Engineering.

[19]  Daniel G. Oliver,et al.  Constraints and Opportunities with Interview Transcription: Towards Reflection in Qualitative Research , 2005, Social forces; a scientific medium of social study and interpretation.

[20]  Tao Xie,et al.  iFixFlakies: a framework for automatically fixing order-dependent flaky tests , 2019, ESEC/SIGSOFT FSE.

[21]  Michael Hilton,et al.  FlakeFlagger: Predicting Flakiness Without Rerunning Tests , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[22]  Antonia Bertolino,et al.  Know You Neighbor: Fast Static Prediction of Test Flakiness , 2021, IEEE Access.

[23]  Teng Long,et al.  Modeling and Ranking Flaky Tests at Apple , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[24]  Peter W. O'Hearn,et al.  From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[25]  Sasa Misailovic,et al.  Detecting flaky tests in probabilistic and machine learning applications , 2020, International Symposium on Software Testing and Analysis.

[26]  Tao Xie,et al.  A large-scale longitudinal study of flaky tests , 2020, Proc. ACM Program. Lang..

[27]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[28]  Suman Nath,et al.  Root causing flaky tests in a large-scale industrial setting , 2019, ISSTA.

[29]  Vahid Garousi,et al.  Guidelines for including the grey literature and conducting multivocal literature reviews in software engineering , 2017, Inf. Softw. Technol..