Choosing the fitness function for the job: Automated generation of test suites that detect real faults

Search‐based unit test generation, if effective at fault detection, can lower the cost of testing. Such techniques rely on fitness functions to guide the search. Ultimately, such functions represent test goals that approximate—but do not ensure—fault detection. The need to rely on approximations leads to two questions—can fitness functions produce effective tests and, if so, which should be used to generate tests? To answer these questions, we have assessed the fault‐detection capabilities of unit test suites generated to satisfy eight white‐box fitness functions on 597 real faults from the Defects4J database. Our analysis has found that the strongest indicators of effectiveness are a high level of code coverage over the targeted class and high satisfaction of a criterion's obligations. Consequently, the branch coverage fitness function is the most effective. Our findings indicate that fitness functions that thoroughly explore system structure should be used as primary generation objectives—supported by secondary fitness functions that explore orthogonal, supporting scenarios. Our results also provide further evidence that future approaches to test generation should focus on attaining higher coverage of private code and better initialization and manipulation of class dependencies.

[1]  Kalyanmoy Deb,et al.  Multi-objective code-smells detection using good and bad design examples , 2016, Software Quality Journal.

[2]  Sanjai Rayadurgam,et al.  Coverage based test-case generation using model checkers , 2001, Proceedings. Eighth Annual IEEE International Conference and Workshop On the Engineering of Computer-Based Systems-ECBS 2001.

[3]  Tim Menzies,et al.  Data Mining for Very Busy People , 2003, Computer.

[4]  Edward Kit,et al.  Software testing in the real world - improving the process , 1995 .

[5]  Mark Harman,et al.  Coverage and fault detection of the output-uniqueness test selection criteria , 2014, ISSTA 2014.

[6]  Paolo Tonella,et al.  Incremental Control Dependency Frontier Exploration for Many-Criteria Test Case Generation , 2018, SSBSE.

[7]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[8]  Kapil Sharma,et al.  Improving software quality based on relationship among the change proneness and object oriented metrics , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[9]  Arvinder Kaur,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010, Software Quality Journal.

[10]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[11]  Mark Harman,et al.  Automated Search for Good Coverage Criteria: Moving from Code Coverage to Fault Coverage through Search-Based Software Engineering , 2016, 2016 IEEE/ACM 9th International Workshop on Search-Based Software Testing (SBST).

[12]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[13]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[14]  Gregory Gay To Call, or Not to Call: Contrasting Direct and Indirect Branch Coverage in Test Generation , 2018, 2018 IEEE/ACM 11th International Workshop on Search-Based Software Testing (SBST).

[15]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, ACM SIGSOFT Softw. Eng. Notes.

[16]  Gordon Fraser,et al.  Automated unit test generation for classes with environment dependencies , 2014, ASE.

[17]  Tim Menzies,et al.  Automatically finding the control variables for complex system behavior , 2010, Automated Software Engineering.

[18]  Gregory Gay,et al.  The Fitness Function for the Job: Search-Based Generation of Test Suites That Detect Real Faults , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[19]  Gordon Fraser,et al.  Combining search-based and constraint-based testing , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[20]  Gordon Fraser,et al.  An empirical evaluation of evolutionary algorithms for unit test suite generation , 2018, Inf. Softw. Technol..

[21]  William E. Perry Effective methods for software testing (3. ed.) , 2006 .

[22]  Andrea Arcuri,et al.  It really does matter how you normalize the branch distance in search‐based software testing , 2013, Softw. Test. Verification Reliab..

[23]  Audris Mockus,et al.  Test coverage and post-verification defects: A multiple case study , 2009, ESEM 2009.

[24]  Gordon Fraser,et al.  Does automated white-box test generation really help software testers? , 2013, ISSTA.

[25]  Luca Maria Gambardella,et al.  A survey on metaheuristics for stochastic combinatorial optimization , 2009, Natural Computing.

[26]  Paul D. Scott,et al.  Coupling and cohesion measures for evaluation of component reusability , 2006, MSR '06.

[27]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[28]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[29]  Gregory Gay,et al.  The Effect of Program and Model Structure on the Effectiveness of MC/DC Test Adequacy Coverage , 2016, ACM Trans. Softw. Eng. Methodol..

[30]  Rudolf Ferenc,et al.  A Public Bug Database of GitHub Projects and Its Application in Bug Prediction , 2016, ICCSA.

[31]  Gregory Gay,et al.  Using Search-Based Test Generation to Discover Real Faults in Guava , 2017, SSBSE.

[32]  Alex Groce,et al.  Coverage and Its Discontents , 2014, Onward!.

[33]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[34]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[35]  Mark Harman,et al.  Search-based software engineering , 2001, Inf. Softw. Technol..

[36]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[37]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[38]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[39]  Gordon Fraser,et al.  Combining Multiple Coverage Criteria in Search-Based Unit Test Generation , 2015, SSBSE.

[40]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[41]  Gregory Gay,et al.  Observable modified condition/decision coverage , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[42]  Mauro Pezzè,et al.  Software testing and analysis - process, principles and techniques , 2007 .

[43]  Gregory Gay,et al.  The Risks of Coverage-Directed Test Case Generation , 2015, IEEE Transactions on Software Engineering.

[44]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[45]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[46]  Neelam Gupta,et al.  Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction , 2007, IEEE Transactions on Software Engineering.

[47]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[48]  Gregory Gay,et al.  Generating Effective Test Suites by Combining Coverage Criteria , 2017, SSBSE.

[49]  Gordon Fraser,et al.  Private API Access and Functional Mocking in Automated Unit Test Generation , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[50]  Gregory Gay Challenges in Using Search-Based Test Generation to Identify Real Faults in Mockito , 2016, SSBSE.

[51]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[52]  Matias Martinez,et al.  Test Case Generation for Program Repair: A Study of Feasibility and Effectiveness , 2017, ArXiv.

[53]  F KemererChris,et al.  Towards a metrics suite for object oriented design , 1991 .

[54]  Gordon Fraser,et al.  Achieving scalable mutation-based generation of whole test suites , 2015, Empirical Software Engineering.

[55]  Mark Harman,et al.  Using hybrid algorithm for Pareto efficient multi-objective test suite minimisation , 2010, J. Syst. Softw..

[56]  Phyllis G. Frankl,et al.  An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria , 1991, TAV4.

[57]  Mark Harman,et al.  Experimental assessment of software metrics using automated refactoring , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[58]  William E. Perry,et al.  Effective methods for software testing , 1995 .

[59]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[60]  Robert Feldt,et al.  Broadening the Search in Search-Based Software Testing: It Need Not Be Evolutionary , 2015, 2015 IEEE/ACM 8th International Workshop on Search-Based Software Testing.

[61]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[62]  Mark Harman,et al.  A multi-objective approach to search-based test data generation , 2007, GECCO '07.

[63]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.

[64]  Lionel C. Briand,et al.  A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation , 2010, IEEE Transactions on Software Engineering.

[65]  Kjetil Moløkken-Østvold,et al.  A review of software surveys on software effort estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[66]  Alex Groce,et al.  Mutations: How Close are they to Real Faults? , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[67]  P. Edith Linda,et al.  Metrics for Component Based Measurement Tools , 2011 .

[68]  Alex Groce,et al.  Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[69]  Marcelo de Almeida Maia,et al.  Dissection of a bug dataset: Anatomy of 395 patches from Defects4J , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[70]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.