The Fitness Function for the Job: Search-Based Generation of Test Suites That Detect Real Faults

Search-based test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate — but do not ensure — fault detection. The need to rely on approximations leads to two questions — can fitness functions produce effective tests and, if so, which should be used to generate tests? To answer these questions, we have assessed the fault-detection capabilities of the EvoSuite framework and eight of its fitness functions on 353 real faults from the Defects4J database. Our analysis has found that the strongest indicator of effectiveness is a high level of code coverage. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives — supported by secondary fitness functions that vary the scenarios explored.

[1]  Phyllis G. Frankl,et al.  An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria , 1991, TAV4.

[2]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, ACM SIGSOFT Softw. Eng. Notes.

[3]  Tim Menzies,et al.  Automatically finding the control variables for complex system behavior , 2010, Automated Software Engineering.

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[6]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[7]  Gregory Gay,et al.  The Effect of Program and Model Structure on the Effectiveness of MC/DC Test Adequacy Coverage , 2016, ACM Trans. Softw. Eng. Methodol..

[8]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[9]  Mauro Pezzè,et al.  Software testing and analysis - process, principles and techniques , 2007 .

[10]  Alex Groce,et al.  Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[11]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[12]  Gordon Fraser,et al.  Combining Multiple Coverage Criteria in Search-Based Unit Test Generation , 2015, SSBSE.

[13]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[14]  Gregory Gay,et al.  Observable modified condition/decision coverage , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Gregory Gay,et al.  The Risks of Coverage-Directed Test Case Generation , 2015, IEEE Transactions on Software Engineering.

[16]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[17]  Gordon Fraser,et al.  Achieving scalable mutation-based generation of whole test suites , 2015, Empirical Software Engineering.

[18]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[19]  William Perry Effective methods for software testing, third edition , 1995 .

[20]  Andrea Arcuri,et al.  It really does matter how you normalize the branch distance in search‐based software testing , 2013, Softw. Test. Verification Reliab..

[21]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[23]  Gordon Fraser,et al.  Does automated white-box test generation really help software testers? , 2013, ISSTA.

[24]  Edward Kit,et al.  Software testing in the real world - improving the process , 1995 .

[25]  Alex Groce,et al.  Mutations: How Close are they to Real Faults? , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[26]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[27]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[28]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[29]  Gordon Fraser,et al.  Combining search-based and constraint-based testing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[30]  Audris Mockus,et al.  Test coverage and post-verification defects: A multiple case study , 2009, ESEM 2009.

[31]  Luca Maria Gambardella,et al.  A survey on metaheuristics for stochastic combinatorial optimization , 2009, Natural Computing.

[32]  Lionel C. Briand,et al.  A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation , 2010, IEEE Transactions on Software Engineering.

[33]  Sanjai Rayadurgam,et al.  Coverage based test-case generation using model checkers , 2001, Proceedings. Eighth Annual IEEE International Conference and Workshop On the Engineering of Computer-Based Systems-ECBS 2001.

[34]  Mark Harman,et al.  Coverage and fault detection of the output-uniqueness test selection criteria , 2014, ISSTA 2014.

[35]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[36]  Tim Menzies,et al.  Data Mining for Very Busy People , 2003, Computer.

[37]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.

[38]  Alex Groce,et al.  Coverage and Its Discontents , 2014, Onward!.