Assessment and Improvement of the Practical Use of Mutation for Automated Software Testing

Software testing is the main quality assurance technique used in software engineering. In fact, companies that develop software and open-source communities alike actively integrate testing into their software development life cycle. In order to guide and give objectives for the software testing process, researchers have designed test adequacy criteria (TAC) which, define the properties of a software that must be covered in order to constitute a thorough test suite. Many TACs have been designed in the literature, among which, the widely used statement and branch TAC, as well as the fault-based TAC named mutation. It has been shown in the literature that mutation is effective at revealing fault in software, nevertheless, mutation adoption in practice is still lagging due to its cost. Ideally, TACs that are most likely to lead to higher fault revelation are desired for testing and, the fault-revelation of test suites is expected to increase as their coverage of TACs test objectives increase. However, the question of which TAC best guides software testing towards fault revelation remains controversial and open, and, the relationship between TACs test objectives’ coverage and fault-revelation remains unknown. In order to increase knowledge and provide answers about these issues, we conducted, in this dissertation, an empirical study that evaluates the relationship between test objectives’ coverage and fault-revelation for four TACs (statement, branch coverage and, weak and strong mutation). The study showed that fault-revelation increase with coverage only beyond some coverage threshold and, strong mutation TAC has highest fault revelation. Despite the benefit of higher fault-revelation that strong mutation TAC provide for software testing, software practitioners are still reluctant to integrate strong mutation into their software testing activities. This happens mainly because of the high cost of mutation analysis, which is related to the large number of mutants and the limitation in the automation of test generation for strong mutation. Several approaches have been proposed, in the literature, to tackle the analysis’ cost issue of strong mutation. Mutant selection (reduction) approaches aim to reduce the number of mutants used for testing by selecting a small subset of mutation operator to apply during mutants generation, thus, reducing the number of analyzed mutants. Nevertheless, those approaches are not more effective, w.r.t. fault-revelation, than random mutant sampling (which leads to a high loss in fault revelation). Moreover, there is not much work in the literature that regards cost-effective automated test generation for strong mutation. This dissertation proposes two techniques, FaRM and SEMu, to reduce the cost of mutation testing. FaRM statically selects and prioritizes mutants that lead to faults (fault-revealing mutants), in order to reduce the number of mutants (fault-revealing mutants represent a very small proportion of the generated mutants). SEMu automatically generates tests that strongly kill mutants and thus, increase the mutation score and improve the test suites. First, this dissertation makes an empirical study that evaluates the fault-revelation (ability to lead to tests that have high fault-revelation) of four TACs, namely statement, branch, weak mutation and strong mutation. The outcome of the study show evidence that for all four studied TACs, the fault-revelation increases with TAC test objectives’ coverage only beyond a certain threshold of coverage. This suggests the need to attain higher coverage during testing. Moreover, the study shows that strong mutation is the only studied TAC that leads to tests that have, significantly,

[1]  Hironori Washizaki,et al.  Open Code Coverage Framework: A Consistent and Flexible Framework for Measuring Test Coverage Supporting Multiple Programming Languages , 2010, 2010 10th International Conference on Quality Software.

[2]  Gordon Fraser,et al.  Achieving scalable mutation-based generation of whole test suites , 2015, Empirical Software Engineering.

[3]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[4]  Cristian Cadar,et al.  Shadow of a Doubt: Testing for Divergences between Software Versions , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[5]  Anthony Ventresque,et al.  Demo: PIT a Practical Mutation Testing Tool for Java , 2016 .

[6]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[7]  Bertrand Meyer,et al.  Is Branch Coverage a Good Measure of Testing Effectiveness? , 2010, LASER Summer School.

[8]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[9]  Sarfraz Khurshid,et al.  Operator-based and random mutant selection: Better together , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[10]  Bruno Rossi,et al.  Is Mutation Testing Ready to Be Adopted Industry-Wide? , 2016, PROFES.

[11]  Lingming Zhang,et al.  An Extensive Study on Cross-Project Predictive Mutation Testing , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[12]  Inmaculada Medina-Bulo,et al.  Evaluation of Mutation Testing in a Nuclear Industry Case Study , 2018, IEEE Transactions on Reliability.

[13]  Sarfraz Khurshid,et al.  Evaluating the Effects of Compiler Optimizations on Mutation Testing at the Compiler IR Level , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[14]  A. Jefferson Offutt,et al.  Designing Deletion Mutation Operators , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[15]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[16]  A.P. Mathur Performance, effectiveness, and reliability issues in software testing , 1991, [1991] Proceedings The Fifteenth Annual International Computer Software & Applications Conference.

[17]  Mike Papadakis,et al.  An Empirical Evaluation of the First and Second Order Mutation Testing Strategies , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[18]  Bertrand Meyer,et al.  On the number and nature of faults found by random testing , 2011, Softw. Test. Verification Reliab..

[19]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[20]  James H. Andrews,et al.  Comparing Multi-Point Stride Coverage and dataflow coverage , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[21]  Alberto Savoia,et al.  Differential testing: a new approach to change detection , 2007, ESEC-FSE '07.

[22]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2012, IEEE Trans. Software Eng..

[23]  Matias Martinez,et al.  B-Refactoring: Automatic test code refactoring to improve dynamic analysis , 2016, Information and Software Technology.

[24]  Yves Le Traon,et al.  Assessing and Improving the Mutation Testing Practice of PIT , 2016, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[25]  R.A. DeMillo,et al.  An extended overview of the Mothra software testing environment , 1988, [1988] Proceedings. Second Workshop on Software Testing, Verification, and Analysis.

[26]  Ali Mesbah,et al.  Guided Mutation Testing for JavaScript Web Applications , 2015, IEEE Transactions on Software Engineering.

[27]  Wes Masri,et al.  Coincidental correctness in the Defects4J benchmark , 2018, Softw. Test. Verification Reliab..

[28]  J. Friedman Stochastic gradient boosting , 2002 .

[29]  Cristian Cadar,et al.  KATCH: high-coverage testing of software patches , 2013, ESEC/FSE 2013.

[30]  J. A. Acree On mutation , 1980 .

[31]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[32]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[33]  Yves Le Traon,et al.  Sound and Quasi-Complete Detection of Infeasible Test Requirements , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[34]  Laurie A. Williams,et al.  Should software testers use mutation analysis to augment a test set? , 2009, J. Syst. Softw..

[35]  René Just,et al.  An Industrial Application of Mutation Testing: Lessons, Challenges, and Research Directions , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[36]  Mark Harman,et al.  Detecting Trivial Mutant Equivalences via Compiler Optimisations , 2018, IEEE Transactions on Software Engineering.

[37]  Andreas Zeller,et al.  Covering and Uncovering Equivalent Mutants , 2013, Softw. Test. Verification Reliab..

[38]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[39]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[40]  Yasutaka Kamei,et al.  Defect Prediction: Accomplishments and Future Challenges , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[41]  Yves Le Traon,et al.  Threats to the validity of mutation-based test assessment , 2016, ISSTA.

[42]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[43]  Mike Papadakis,et al.  Automatically performing weak mutation with the aid of symbolic execution, concolic testing and search-based testing , 2011, Software Quality Journal.

[44]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[45]  A. Jefferson Offutt,et al.  Mutant Subsumption Graphs , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.

[46]  Ming Wen,et al.  Context-Aware Patch Generation for Better Automated Program Repair , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[47]  Pankaj Sharma,et al.  MuRanker: a mutant ranking tool , 2015, Softw. Test. Verification Reliab..

[48]  Bruno C. d. S. Oliveira,et al.  Partition-based regression verification , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[49]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[50]  Morgan B Kaufmann,et al.  Mutation Testing for the New Century , 2002, Computer.

[51]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.

[52]  Serge Demeyer,et al.  LittleDarwin: A Feature-Rich and Extensible Mutation Testing Framework for Large and Complex Java Systems , 2017, FSEN.

[53]  A. Jefferson Offutt,et al.  An Experimental Comparison of Four Unit Test Criteria: Mutation, Edge-Pair, All-Uses and Prime Path Coverage , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[54]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[55]  Abhik Roychoudhury,et al.  Codeflaws: A Programming Competition Benchmark for Evaluating Automated Program Repair Tools , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[56]  Thomas Keck,et al.  FastBDT: A speed-optimized and cache-friendly implementation of stochastic gradient-boosted decision trees for multivariate classification , 2016, ArXiv.

[57]  Mike Papadakis,et al.  Evaluating Mutation Testing Alternatives: A Collateral Experiment , 2010, 2010 Asia Pacific Software Engineering Conference.

[58]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[59]  Myra B. Cohen,et al.  An orchestrated survey of methodologies for automated software test case generation , 2013, J. Syst. Softw..

[60]  Nikolai Tillmann,et al.  eXpress: guided path exploration for efficient regression test generation , 2011, ISSTA '11.

[61]  Fabiano Cutigi Ferrari,et al.  A Systematic Review of Cost Reduction Techniques for Mutation Testing: Preliminary Results , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[62]  Alex Groce,et al.  Cause reduction: delta debugging, even without bugs , 2016, Softw. Test. Verification Reliab..

[63]  A. Jefferson Offutt,et al.  Analyzing the validity of selective mutation with dominator mutants , 2016, SIGSOFT FSE.

[64]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[65]  Alex Groce,et al.  Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[66]  Lu Zhang,et al.  Predictive Mutation Testing , 2016, IEEE Transactions on Software Engineering.

[67]  Bo Wang,et al.  Faster mutation analysis via equivalence modulo states , 2017, ISSTA.

[68]  Yves Le Traon,et al.  Comparing White-Box and Black-Box Test Prioritization , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[69]  I. Comparison Faster Mutation Testing Inspired by Test Prioritization and Reduction , 2013 .

[70]  Gregg Rothermel,et al.  An experimental evaluation of selective mutation , 1993, Proceedings of 1993 15th International Conference on Software Engineering.

[71]  Fabiano Cutigi Ferrari,et al.  A systematic literature review of techniques and metrics to reduce the cost of mutation testing , 2019, J. Syst. Softw..

[72]  A. Jefferson Offutt,et al.  An Experimental Evaluation of Data Flow and Mutation Testing , 1996 .

[73]  Auri Marcelo Rizzo Vincenzi,et al.  Toward the determination of sufficient mutant operators for C † , 2001, Softw. Test. Verification Reliab..

[74]  José Carlos Maldonado,et al.  Proteum/IM 2.0: An Integrated Mutation Testing Environment , 2001 .

[75]  René Just,et al.  Inferring mutant utility from program context , 2017, ISSTA.

[76]  Inmaculada Medina-Bulo,et al.  Search-based mutant selection for efficient test suite improvement: Evaluation and results , 2018, Inf. Softw. Technol..

[77]  Lars Grunske,et al.  Semantic Program Repair Using a Reference Implementation , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[78]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[79]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[80]  Mark Harman,et al.  Strong higher order mutation-based test data generation , 2011, ESEC/FSE '11.

[81]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[82]  A. Jefferson Offutt,et al.  MuJava: a mutation system for java , 2006, ICSE.

[83]  Charles Radley,et al.  Safeware: System safety and computers. A guide to preventing accidents and losses caused by technology , 1996 .

[84]  Rudolf Ramler,et al.  An empirical study on the application of mutation testing for a safety-critical industrial software system , 2017, SAC.

[85]  Timothy Alan Budd,et al.  Mutation analysis of program test data , 1980 .

[86]  Abhik Roychoudhury,et al.  relifix: Automated Repair of Software Regressions , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[87]  A. Jefferson Offutt,et al.  Mutation analysis using mutant schemata , 1993, ISSTA '93.

[88]  Paolo Tonella,et al.  Incremental Control Dependency Frontier Exploration for Many-Criteria Test Case Generation , 2018, SSBSE.

[89]  Roland H. Untch On reduced neighborhood mutation analysis using a single mutagenic operator , 2009, ACM-SE 47.

[90]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[91]  Phyllis G. Frankl,et al.  An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria , 1991, TAV4.

[92]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[93]  Alex Denisov,et al.  Mull It Over: Mutation Testing Based on LLVM , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[94]  Mark Harman,et al.  An analysis of the relationship between conditional entropy and failed error propagation in software testing , 2014, ICSE.

[95]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[96]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[97]  Yuanyuan Zhang,et al.  Achievements, Open Problems and Challenges for Search Based Software Testing , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[98]  A. Jefferson Offutt,et al.  Establishing Theoretical Minimal Sets of Mutants , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[99]  Christophe Calvès,et al.  Faults in linux: ten years later , 2011, ASPLOS XVI.

[100]  A. Jefferson Offutt,et al.  Are We There Yet? How Redundant and Equivalent Mutants Affect Determination of Test Completeness , 2016, 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[101]  Shin Yoo,et al.  Are Mutation Scores Correlated with Real Fault Detection? A Large Scale Empirical Study on the Relationship Between Mutants and Real Faults , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[102]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[103]  Luciano Baresi,et al.  An Introduction to Software Testing , 2006, FoVMT.

[104]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[105]  Alex Groce,et al.  Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites , 2015, ACM Trans. Softw. Eng. Methodol..

[106]  Akbar Siami Namin,et al.  Sufficient mutation operators for measuring test effectiveness , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[107]  Fanlin Meng,et al.  Mutant reduction based on dominance relation for weak mutation testing , 2017, Inf. Softw. Technol..

[108]  Cristian Cadar,et al.  make test-zesti: A symbolic execution solution for improving regression testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[109]  Mark Harman,et al.  Automated Test Data Generation for Coverage: Haven't We Solved This Problem Yet? , 2009, 2009 Testing: Academic and Industrial Conference - Practice and Research Techniques.

[110]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[111]  Goran Petrovic,et al.  State of Mutation Testing at Google , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[112]  Ibrahim Habli,et al.  An Empirical Evaluation of Mutation Testing for Improving the Test Quality of Safety-Critical Software , 2013, IEEE Transactions on Software Engineering.

[113]  Andreas Zeller,et al.  Simplifying and Isolating Failure-Inducing Input , 2002, IEEE Trans. Software Eng..

[114]  Gary McGraw,et al.  Software fault injection: inoculating programs against errors , 1997 .

[115]  A. Jefferson Offutt,et al.  Constraint-Based Automatic Test Data Generation , 1991, IEEE Trans. Software Eng..

[116]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[117]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[118]  Sang-Woon Kim,et al.  Combining weak and strong mutation for a noninterpretive Java mutation system , 2013, Softw. Test. Verification Reliab..

[119]  Sarfraz Khurshid,et al.  Directed incremental symbolic execution , 2011, PLDI '11.

[120]  Alex Groce,et al.  The Theory of Composite Faults , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[121]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[122]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[123]  A. Jefferson Offutt,et al.  Empirical Evaluation of the Statement Deletion Mutation Operator , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[124]  Yves Le Traon,et al.  Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[125]  Paolo Tonella,et al.  Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets , 2018, IEEE Transactions on Software Engineering.

[126]  Alex Groce,et al.  On The Limits of Mutation Reduction Strategies , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[127]  Larry J Morell,et al.  A Theory of Fault-Based Testing , 1990, IEEE Trans. Software Eng..

[128]  Mike Papadakis,et al.  Mutation based test case generation via a path selection strategy , 2012, Inf. Softw. Technol..

[129]  Nikolai Tillmann,et al.  Test generation via Dynamic Symbolic Execution for mutation testing , 2010, 2010 IEEE International Conference on Software Maintenance.

[130]  S. L. Gerhart,et al.  Toward a theory of test data selection , 1975, IEEE Transactions on Software Engineering.

[131]  Mike Papadakis,et al.  Automatic Mutation Test Case Generation via Dynamic Symbolic Execution , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[132]  Domenico Cotroneo,et al.  On Fault Representativeness of Software Fault Injection , 2013, IEEE Transactions on Software Engineering.

[133]  Leonardo Bottaci,et al.  Efficiency of mutation operators and selective mutation strategies: an empirical study , 1999, Softw. Test. Verification Reliab..

[134]  Matthew B. Dwyer,et al.  Differential symbolic execution , 2008, SIGSOFT '08/FSE-16.

[135]  A. Jefferson Offutt,et al.  A mutation carol: Past, present and future , 2011, Inf. Softw. Technol..

[136]  Mark Harman,et al.  A study of equivalent and stubborn mutation operators using human analysis of equivalence , 2014, ICSE.

[137]  R. Lipton,et al.  Mutation analysis , 1998 .

[138]  W. Eric Wong,et al.  Reducing the cost of mutation testing: An empirical study , 1995, J. Syst. Softw..