Lightweight Lexical Test Prioritization for Immediate Feedback

The practice of unit testing enables programmers to obtain automated feedback on whether a currently edited program is consistent with the expectations specified in test cases. Feedback is most valuable when it happens immediately, as defects can be corrected instantly before they become harder to fix. With growing and longer running test suites, however, feedback is obtained less frequently and lags behind program changes. The objective of test prioritization is to rank tests so that defects, if present, are found as early as possible or with the least costs. While there are numerous static approaches that output a ranking of tests solely based on the current version of a program, we focus on change-based test prioritization, which recommends tests that likely fail in response to the most recent program change. The canonical approach relies on coverage data and prioritizes tests that cover the changed region, but obtaining and updating coverage data is costly. More recently, information retrieval techniques that exploit overlapping vocabulary between change and tests have proven to be powerful, yet lightweight. In this work, we demonstrate the capabilities of information retrieval for prioritizing tests in dynamic programming languages using Python as example. We discuss and measure previously understudied variation points, including how contextual information around a program change can be used, and design alternatives to the widespread \emph{TF-IDF} retrieval model tailored to retrieving failing tests. To obtain program changes with associated test failures, we designed a tool that generates a large set of faulty changes from version history along with their test results. Using this data set, we compared existing and new lexical prioritization strategies using four open-source Python projects, showing large improvements over untreated and random test orders and results consistent with related work in statically typed languages. We conclude that lightweight IR-based prioritization strategies are effective tools to predict failing tests in the absence of coverage data or when static analysis is intractable like in dynamic languages. This knowledge can benefit both individual programmers that rely on fast feedback, as well as operators of continuous integration infrastructure, where resources can be freed sooner by detecting defects earlier in the build cycle.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  Robert Hirschfeld,et al.  Babylonian-style Programming: Design and Implementation of an Integration of Live Examples into General-purpose Source Code , 2019, Art Sci. Eng. Program..

[3]  Eunseok Lee,et al.  Failure history data-based test case prioritization for effective regression test , 2017, SAC.

[4]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[5]  Stefan Marr,et al.  The Art, Science, and Engineering of Programming , 2019, The Art, Science, and Engineering of Programming.

[6]  Gregg Rothermel,et al.  Test Case Prioritization: A Family of Empirical Studies , 2002, IEEE Trans. Software Eng..

[7]  Ahmed E. Hassan,et al.  Static test case prioritization using topic models , 2014, Empirical Software Engineering.

[8]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[9]  Gregg Rothermel,et al.  Test case prioritization: an empirical study , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[10]  Sarfraz Khurshid,et al.  An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[11]  Robert Hirschfeld,et al.  Faster feedback through lexical test prioritization , 2019, Programming.

[12]  Lingming Zhang,et al.  Chapter One - A Survey on Regression Test-Case Prioritization , 2019, Adv. Comput..

[13]  Lu Zhang,et al.  Mutation-based test-case prioritization in software evolution , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Robert Hirschfeld,et al.  Exploratory and Live, Programming and Coding: A Literature Study Comparing Perspectives on Liveness , 2018, Art Sci. Eng. Program..

[15]  Darko Marinov,et al.  Reflection-aware static regression test selection , 2019, Proc. ACM Program. Lang..

[16]  Gregg Rothermel,et al.  Incorporating varying test costs and fault severities into test case prioritization , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[17]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[18]  Mark Harman,et al.  Search Algorithms for Regression Test Case Prioritization , 2007, IEEE Transactions on Software Engineering.

[19]  Andreas Zeller,et al.  Javalanche: efficient mutation testing for Java , 2009, ESEC/SIGSOFT FSE.

[20]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Qi Luo,et al.  A large-scale empirical comparison of static and dynamic test case prioritization techniques , 2016, SIGSOFT FSE.

[22]  Lu Zhang,et al.  Prioritizing JUnit test cases in absence of coverage information , 2009, 2009 IEEE International Conference on Software Maintenance.