An Empirical Analysis of Blind Tests

Modern software engineers automate as many tests as possible. Test automation allows tests to be run hundreds or thousands of times: hourly, daily, and sometimes continuously. This saves time and money, ensures reproducibility, and ultimately leads to software that is better and cheaper. Automated tests must include code to check that the output of the program on the test matches expected behavior. This code is called the test oracle and is typically implemented in assertions that flag the test as passing if the assertion evaluates to true and failing if not. Since automated tests require programming, many problems can occur. Some lead to false positives, where incorrect behavior is marked as correct, and others to false negatives, where correct behavior is marked as incorrect. This paper identifies and studies a common problem where test assertions are written incorrectly, leading to incorrect behavior that is not recognized. We call these tests blind because the test does not see the incorrect behavior. Blind tests cause false positives, essentially wasting the tests. This paper presents results from several human-based studies to assess the frequency of blind tests with different software and different populations of users. In our studies, the percent of blind tests ranged from a low of 39% to a high of 95%.

[1]  Kent Beck,et al.  Test-infected: programmers love writing tests , 2000 .

[2]  Kenneth Koster,et al.  State coverage: a structural test adequacy criterion for behavior checking , 2007, ESEC-FSE '07.

[3]  Gregg Rothermel,et al.  Dodona: automated oracle data set selection , 2014, ISSTA 2014.

[4]  Gregory Gay,et al.  Automated oracle creation support, or: How I learned to stop worrying about fault propagation and love mutation testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  A. Jefferson Offutt,et al.  Constraint-Based Automatic Test Data Generation , 1991, IEEE Trans. Software Eng..

[6]  Larry Joe Morell A theory of error-based testing , 1983 .

[7]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[8]  Amin Milani Fard,et al.  An empirical study of bugs in test code , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9]  Lasse Koskela,et al.  Test Driven: Practical TDD and Acceptance TDD for Java Developers , 2007 .

[10]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[11]  Michael W. Whalen,et al.  Better testing through oracle selection , 2011, ICSE 2011.

[12]  Tao Xie,et al.  UnitPlus: assisting developer testing in Eclipse , 2007, eclipse '07.

[13]  Atif M. Memon,et al.  Designing and comparing automated test oracles for GUI-based software applications , 2007, TSEM.

[14]  A. Jefferson Offutt,et al.  Test Oracle Strategies for Model-Based Testing , 2017, IEEE Transactions on Software Engineering.

[15]  Vi Andrew Jefferson Offutt,et al.  Automatic test data generation , 1988 .

[16]  Vahid Garousi,et al.  On Adequacy of Assertions in Automated Test Suites: An Empirical Investigation , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops.

[17]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[18]  Andreas Zeller,et al.  Checked coverage: an indicator for oracle quality , 2013, Softw. Test. Verification Reliab..

[19]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[20]  Dor D. Ma'ayan The Quality of Junit Tests: An Empirical Study Report , 2018, 2018 IEEE/ACM 1st International Workshop on Software Qualities and their Dependencies (SQUADE).

[21]  Mats Per Erik Heimdahl,et al.  Programs, tests, and oracles: the foundations of testing revisited , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[22]  Tao Xie,et al.  Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking , 2006, ECOOP.

[23]  Roy S. Freedman,et al.  Testability of Software Components , 1991, IEEE Trans. Software Eng..