How Do Automatically Generated Unit Tests Influence Software Maintenance?

Generating unit tests automatically saves time over writing tests manually and can lead to higher code coverage. However, automatically generated tests are usually not based on realistic scenarios, and are therefore generally considered to be less readable. This places a question mark over their practical value: Every time a test fails, a developer has to decide whether this failure has revealed a regression fault in the program under test, or whether the test itself needs to be updated. Does the fact that automatically generated tests are harder to read outweigh the time-savings gained by their automated generation, and render them more of a hindrance than a help for software maintenance? In order to answer this question, we performed an empirical study in which participants were presented with an automatically generated or manually written failing test, and were asked to identify and fix the cause of the failure. Our experiment and two replications resulted in a total of 150 data points based on 75 participants. Whilst maintenance activities take longer when working with automatically generated tests, we found developers to be equally effective with manually written and automatically generated tests. This has implications on how automated test generation is best used in practice, and it indicates a need for research into the generation of more realistic tests.

[1]  Sheeva Afshan,et al.  Evolving Readable String Test Inputs Using a Natural Language Model to Reduce Human Oracle Cost , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[2]  Brad A. Myers,et al.  An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance Tasks , 2006, IEEE Transactions on Software Engineering.

[3]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[4]  René Just,et al.  MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[5]  Darko Marinov,et al.  ReAssert: Suggesting Repairs for Broken Unit Tests , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[6]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[7]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[8]  Tim Menzies,et al.  Nighthawk: a two-level genetic-random unit test data generator , 2007, ASE.

[9]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2012, IEEE Trans. Software Eng..

[10]  Paolo Tonella,et al.  Evolutionary testing of classes , 2004, ISSTA '04.

[11]  Boyang Li,et al.  Automatically Documenting Unit Test Cases , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[12]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[13]  Gordon Fraser,et al.  Automated unit test generation during software development: a controlled experiment and think-aloud observations , 2015, ISSTA.

[14]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[16]  Gordon Fraser,et al.  EvoSuite at the SBST 2017 Tool Competition , 2017, 2017 IEEE/ACM 10th International Workshop on Search-Based Software Testing (SBST).

[17]  Harald C. Gall,et al.  The impact of test case summaries on bug fixing performance: an empirical investigation , 2016, PeerJ Prepr..

[18]  Dongmei Zhang,et al.  How do software engineers understand code changes?: an exploratory study in industry , 2012, SIGSOFT FSE.

[19]  Tim Menzies,et al.  Genetic Algorithms for Randomized Unit Testing , 2011, IEEE Transactions on Software Engineering.

[20]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[21]  Nikolai Tillmann,et al.  Transferring an automated test generation tool to practice: from pex to fakes and code digger , 2014, ASE.

[22]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[23]  Gordon Fraser,et al.  Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study , 2015, ACM Trans. Softw. Eng. Methodol..

[24]  Yann-Gaël Guéhéneuc,et al.  Instance Generator and Problem Representation to Improve Object Oriented Code Coverage , 2015, IEEE Transactions on Software Engineering.

[25]  Walter F. Tichy,et al.  Hints for Reviewing Empirical Work in Software Engineering , 2000, Empirical Software Engineering.

[26]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[27]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[28]  Margaret M. Burnett,et al.  A practical guide to controlled experiments of software engineering tools with human participants , 2013, Empirical Software Engineering.

[29]  Westley Weimer,et al.  Benefits and barriers of user evaluation in software engineering research , 2011, OOPSLA '11.

[30]  Keith H. Bennett,et al.  Software maintenance and evolution: a roadmap , 2000, ICSE '00.

[31]  Bertrand Meyer,et al.  Automatic testing of object-oriented software , 2011 .

[32]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[33]  Zhendong Su,et al.  Synthesizing method sequences for high-coverage testing , 2011, OOPSLA '11.

[34]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[35]  Alessandra Gorla,et al.  Automatic generation of oracles for exceptional behaviors , 2016, ISSTA.

[36]  Mauro Pezzè,et al.  Automatically repairing test cases for evolving method declarations , 2010, 2010 IEEE International Conference on Software Maintenance.

[37]  Tao Xie,et al.  Augmenting Automatically Generated Unit-Test Suites with Regression Oracle Checking , 2006, ECOOP.

[38]  Claes Wohlin,et al.  Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment , 2000, Empirical Software Engineering.

[39]  Natalia Juristo Juzgado,et al.  Are Students Representatives of Professionals in Software Engineering Experiments? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[40]  Lu Zhang,et al.  Is This a Bug or an Obsolete Test? , 2013, ECOOP.

[41]  Mauro Pezzè,et al.  Supporting Test Suite Evolution through Test Case Adaptation , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[42]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[43]  David Notkin,et al.  Symstra: A Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution , 2005, TACAS.

[44]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[45]  Darko Marinov,et al.  On test repair using symbolic execution , 2010, ISSTA '10.

[46]  Luciano Baresi,et al.  TestFul: An Evolutionary Test Approach for Java , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[47]  Jeffrey C. Carver,et al.  Issues in using students in empirical studies in software engineering education , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[48]  Xin Yao,et al.  Search based software testing of object-oriented containers , 2008, Inf. Sci..

[49]  Andrea De Lucia,et al.  Automatic test case generation: what if test code quality matters? , 2016, ISSTA.

[50]  Gordon Fraser,et al.  Modeling readability to improve unit tests , 2015, ESEC/SIGSOFT FSE.

[51]  Mariano Ceccato,et al.  Do Automatically Generated Test Cases Make Debugging Easier? An Experimental Assessment of Debugging Effectiveness and Efficiency , 2015, ACM Trans. Softw. Eng. Methodol..