A Theoretical and Empirical Study of Diversity-Aware Mutation Adequacy Criterion

Diversity has been widely studied in software testing as a guidance towards effective sampling of test inputs in the vast space of possible program behaviors. However, diversity has received relatively little attention in mutation testing. The traditional mutation adequacy criterion is a one-dimensional measure of the total number of killed mutants. We propose a novel, diversity-aware mutation adequacy criterion called distinguishing mutation adequacy criterion, which is fully satisfied when each of the considered mutants can be identified by the set of tests that kill it, thereby encouraging inclusion of more diverse range of tests. This paper presents the formal definition of the distinguishing mutation adequacy and its score. Subsequently, an empirical study investigates the relationship among distinguishing mutation score, fault detection capability, and test suite size. The results show that the distinguishing mutation adequacy criterion detects 1.33 times more unseen faults than the traditional mutation adequacy criterion, at the cost of a 1.56 times increase in test suite size, for adequate test suites that fully satisfies the criteria. The results show a better picture for inadequate test suites; on average, 8.63 times more unseen faults are detected at the cost of a 3.14 times increase in test suite size.

[1]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[2]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[3]  Larry J Morell,et al.  A Theory of Fault-Based Testing , 1990, IEEE Trans. Software Eng..

[4]  Angelo Gargantini,et al.  Using model checking to generate tests from requirements specifications , 1999, ESEC/FSE-7.

[5]  Gregg Rothermel,et al.  On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques , 2006, IEEE Transactions on Software Engineering.

[6]  René Just,et al.  Higher accuracy and lower run time: efficient mutation analysis using non‐redundant mutation operators , 2015, Softw. Test. Verification Reliab..

[7]  Yves Le Traon,et al.  Improving test suites for efficient fault localization , 2006, ICSE.

[8]  Gordon Fraser,et al.  Achieving scalable mutation-based generation of whole test suites , 2015, Empirical Software Engineering.

[9]  Hong Zhu,et al.  A Formal Analysis of the Subsume Relation Between Software Test Adequacy Criteria , 1996, IEEE Trans. Software Eng..

[10]  Doo-Hwan Bae,et al.  Comprehensive analysis of FBD test coverage criteria using mutants , 2014, Software & Systems Modeling.

[11]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[12]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[13]  Mark Harman,et al.  Coverage and fault detection of the output-uniqueness test selection criteria , 2014, ISSTA 2014.

[14]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[15]  Dana Angluin,et al.  Two notions of correctness and their relation to testing , 1982, Acta Informatica.

[16]  Michael D. Ernst,et al.  Efficient mutation analysis by propagating and partitioning infected execution states , 2014, ISSTA 2014.

[17]  Yves Le Traon,et al.  Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[18]  A. Jefferson Offutt,et al.  Improving logic-based testing , 2013, J. Syst. Softw..

[19]  Phil McMinn,et al.  Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[20]  Gordon Fraser,et al.  Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  I. K. Mak,et al.  Adaptive Random Testing , 2004, ASIAN.

[22]  Macario Polo,et al.  Parallel mutation testing , 2013, Softw. Test. Verification Reliab..

[23]  Michael D. Ernst,et al.  Improving test suites via operational abstraction , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[24]  Nicos Malevris,et al.  MEDIC: A static analysis framework for equivalent mutant identification , 2015, Inf. Softw. Technol..

[25]  Macario Polo,et al.  Validating Second-Order Mutation at System Level , 2013, IEEE Transactions on Software Engineering.

[26]  Doo-Hwan Bae,et al.  A Theoretical Framework for Understanding Mutation-Based Testing Methods , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[27]  Yue Jia,et al.  KD-ART: Should we intensify or diversify tests to kill mutants? , 2017, Inf. Softw. Technol..

[28]  A. Jefferson Offutt,et al.  Establishing Theoretical Minimal Sets of Mutants , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[29]  M. R. Woodward,et al.  From weak to strong, dead or alive? an analysis of some mutation testing issues , 1988, [1988] Proceedings. Second Workshop on Software Testing, Verification, and Analysis.

[30]  Hiroaki Yoshida,et al.  MuVM: Higher Order Mutation Analysis Virtual Machine for C , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[31]  Timothy Alan Budd,et al.  Mutation analysis of program test data , 1980 .

[32]  Sang-Woon Kim,et al.  Mutation testing cost reduction by clustering overlapped mutants , 2016, J. Syst. Softw..

[33]  Mark Harman,et al.  Higher Order Mutation Testing , 2009, Inf. Softw. Technol..

[34]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[35]  A. Jefferson Offutt,et al.  Mutant Subsumption Graphs , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.

[36]  Tony Gorschek,et al.  Searching for Cognitively Diverse Tests: Towards Universal Test Diversity Metrics , 2008, 2008 IEEE International Conference on Software Testing Verification and Validation Workshop.

[37]  A. Jefferson Offutt,et al.  An Experimental Comparison of Four Unit Test Criteria: Mutation, Edge-Pair, All-Uses and Prime Path Coverage , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[38]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[39]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[40]  Mark Harman,et al.  Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge , 2009, ISSTA.

[41]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[42]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[43]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[44]  David Leon,et al.  Visualizing similarity between program executions , 2005, 16th IEEE International Symposium on Software Reliability Engineering (ISSRE'05).

[45]  Shin Yoo,et al.  Diversity-Aware Mutation Adequacy Criterion for Improving Fault Detection Capability , 2016, 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[46]  Mark Harman,et al.  Strong higher order mutation-based test data generation , 2011, ESEC/FSE '11.

[47]  Richard Torkar,et al.  Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation , 2014, IEEE Transactions on Software Engineering.

[48]  David Clark,et al.  Test Set Diameter: Quantifying the Diversity of Sets of Test Cases , 2015, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[49]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[50]  Akbar Siami Namin,et al.  Sufficient mutation operators for measuring test effectiveness , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[51]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[52]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[53]  Mike Papadakis,et al.  Evaluating Mutation Testing Alternatives: A Collateral Experiment , 2010, 2010 Asia Pacific Software Engineering Conference.

[54]  Richard J. Lipton,et al.  Theoretical and empirical studies on using program mutation to test the functional correctness of programs , 1980, POPL '80.

[55]  René Just,et al.  Using Non-redundant Mutation Operators and Test Suite Prioritization to Achieve Efficient and Scalable Mutation Analysis , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[56]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.