Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of "test smells", a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.

[1]  Gabriele Bavota,et al.  An empirical analysis of the distribution of unit test smells and their impact on software maintenance , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[2]  Yann-Gaël Guéhéneuc,et al.  Instance Generator and Problem Representation to Improve Object Oriented Code Coverage , 2015, IEEE Transactions on Software Engineering.

[3]  Michael D. Ernst,et al.  Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[4]  Gabriele Bavota,et al.  Are test smells really harmful? An empirical study , 2014, Empirical Software Engineering.

[5]  Alberto Bacchelli,et al.  Investigating Severity Thresholds for Test Smells , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[6]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  Gabriele Bavota,et al.  An empirical investigation into the nature of test smells , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Gordon Fraser,et al.  Automated unit test generation for classes with environment dependencies , 2014, ASE.

[9]  Paolo Tonella,et al.  A large scale empirical comparison of state-of-the-art search-based test case generators , 2018, Inf. Softw. Technol..

[10]  Arie van Deursen,et al.  Refactoring test code , 2001 .

[11]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[12]  Gordon Fraser,et al.  An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[13]  Tao Xie,et al.  Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking , 2006, ECOOP.

[14]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[15]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[16]  Luciano Baresi,et al.  TestFul: automatic unit-test generation for Java classes , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Annibale Panichella,et al.  Java Unit Testing Tool Competition - Fifth Round , 2017, 2017 IEEE/ACM 10th International Workshop on Search-Based Software Testing (SBST).

[18]  Paolo Tonella,et al.  Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets , 2018, IEEE Transactions on Software Engineering.

[19]  Gordon Fraser,et al.  Seeding strategies in search‐based unit test generation , 2016, Softw. Test. Verification Reliab..

[20]  Hiroyuki Sato,et al.  GRT: Program-Analysis-Guided Random Testing (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Gordon Fraser,et al.  How Do Automatically Generated Unit Tests Influence Software Maintenance? , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[22]  Anthony Peruma What the Smell? An Empirical Investigation on the Distribution and Severity of Test Smells in Open Source Android Applications , 2018 .

[23]  Harald C. Gall,et al.  The impact of test case summaries on bug fixing performance: an empirical investigation , 2016, PeerJ Prepr..

[24]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2010, IEEE Transactions on Software Engineering.

[25]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[26]  Gordon Fraser,et al.  A detailed investigation of the effectiveness of whole test suite generation , 2017, Empirical Software Engineering.

[27]  Gabriele Bavota,et al.  When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away) , 2015, IEEE Transactions on Software Engineering.

[28]  Tim Menzies,et al.  Genetic Algorithms for Randomized Unit Testing , 2011, IEEE Transactions on Software Engineering.

[29]  Gordon Fraser,et al.  Parameter tuning or default values? An empirical investigation in search-based software engineering , 2013, Empirical Software Engineering.

[30]  Andy Zaidman,et al.  On the Relation of Test Smells to Software Code Quality , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[31]  Paolo Tonella,et al.  Evolutionary testing of classes , 2004, ISSTA '04.

[32]  Sai Zhang,et al.  Practical semantic test simplification , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[33]  Sebastiano Panichella,et al.  Java Unit Testing Tool Competition: Eighth Round , 2020, ICSE.

[34]  Gordon Fraser,et al.  An empirical evaluation of evolutionary algorithms for unit test suite generation , 2018, Inf. Softw. Technol..

[35]  Harald C. Gall,et al.  Scented since the beginning: On the diffuseness of test smells in automatically generated test code , 2019, J. Syst. Softw..

[36]  Gordon Fraser,et al.  Modeling readability to improve unit tests , 2015, ESEC/SIGSOFT FSE.

[37]  Gordon Fraser,et al.  1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite , 2015, Empirical Software Engineering.

[38]  Gordon Fraser,et al.  Generating unit tests with descriptive names or: would you name your children thing1 and thing2? , 2017, ISSTA.

[39]  Alexander Chatzigeorgiou,et al.  Ten years of JDeodorant: Lessons learned from the hunt for smells , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[40]  Urko Rueda,et al.  Java Unit Testing Tool Competition - Seventh Round , 2019, 2019 IEEE/ACM 12th International Workshop on Search-Based Software Testing (SBST).

[41]  GORDON FRASER,et al.  A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite , 2014, ACM Trans. Softw. Eng. Methodol..