An Industrial Evaluation of Unit Test Generation: Finding Real Faults in a Financial Application

Automated unit test generation has been extensively studied in the literature in recent years. Previous studies on open source systems have shown that test generation tools are quite effective at detecting faults, but how effective and applicable are they in an industrial application? In this paper, we investigate this question using a life insurance and pension products calculator engine owned by SEB Life & Pension Holding AB Riga Branch. To study fault-finding effectiveness, we extracted 25 real faults from the version history of this software project, and applied two up-to-date unit test generation tools for Java, EVOSUITE and RANDOOP, which implement search-based and feedback-directed random test generation, respectively. Automatically generated test suites detected up to 56.40% (EVOSUITE) and 38.00% (RANDOOP) of these faults. The analysis of our results demonstrates challenges that need to be addressed in order to improve fault detection in test generation tools. In particular, classification of the undetected faults shows that 97.62% of them depend on either "specific primitive values" (50.00%) or the construction of "complex state configuration of objects" (47.62%). To study applicability, we surveyed the developers of the application under test on their experience and opinions about the test generation tools and the generated test cases. This leads to insights on requirements for academic prototypes for successful technology transfer from academic research to industrial practice, such as a need to integrate with popular build tools, and to improve the readability of the generated tests.

[1]  Zoltán Micskei,et al.  Evaluating Symbolic Execution-Based Test Tools , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[2]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[3]  Luciano Baresi,et al.  TestFul: An Evolutionary Test Approach for Java , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[4]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2012, IEEE Trans. Software Eng..

[5]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[6]  Georgios Gousios,et al.  When, how, and why developers (do not) test in their IDEs , 2015, ESEC/SIGSOFT FSE.

[7]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[8]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Gordon Fraser,et al.  EvoSuite at the Second Unit Testing Tool Competition , 2013, FITTEST@ICTSS.

[10]  Gordon Fraser,et al.  Combining Multiple Coverage Criteria in Search-Based Unit Test Generation , 2015, SSBSE.

[11]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[12]  GORDON FRASER,et al.  A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite , 2014, ACM Trans. Softw. Eng. Methodol..

[13]  Gordon Fraser,et al.  1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite , 2015, Empirical Software Engineering.

[14]  Nikolai Tillmann,et al.  Precise identification of problems for structural test generation , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[15]  Gordon Fraser,et al.  Automated unit test generation for classes with environment dependencies , 2014, ASE.

[16]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[17]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[18]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[19]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[20]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[21]  Hadi Hemmati,et al.  How Effective Are Code Coverage Criteria? , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[22]  Lionel C. Briand,et al.  Random Testing: Theoretical Results and Practical Implications , 2012, IEEE Transactions on Software Engineering.