On Comparing Mutation Testing Tools through Learning-based Mutant Selection

Recently many mutation testing tools have been proposed that rely on bug-fix patterns and natural language models trained on large code corpus. As these tools operate fundamentally differently from the grammar-based traditional approaches, a question arises of how these tools compare in terms of 1) fault detection and 2) cost-effectiveness. Simultaneously, mutation testing research proposes mutant selection approaches based on machine learning to mitigate its application cost. This raises another question: How do the existing mutation testing tools compare when guided by mutant selection approaches? To answer these questions, we compare four existing tools – μBERT (uses pre-trained language model for fault seeding), IBIR (relies on inverted fix-patterns), DeepMutation (generates mutants by employing Neural Machine Translation) and PIT (applies standard grammar-based rules) in terms of fault detection capability and cost-effectiveness, in conjunction with standard and deep learning based mutant selection strategies. Our results show that IBIR has the highest fault detection capability among the four tools; however, it is not the most cost-effective when considering different selection strategies. On the other hand, μBERT having a relatively lower fault detection capability, is the most cost-effective among the four tools. Our results also indicate that comparing mutation testing tools when using deep learning-based mutant selection strategies can lead to different conclusions than the standard mutant selection. For instance, our results demonstrate that combining μBERT with deep learning-based mutant selection yields 12% higher fault detection than the considered tools.

[1]  Thomas Laurent,et al.  On the use of commit-relevant mutants , 2022, Empirical Software Engineering.

[2]  Samuel J. Kaufman,et al.  Prioritizing Mutants to Guide Mutation Testing , 2022, 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE).

[3]  H. Wehrheim,et al.  Learning Realistic Mutations: Bug Creation for Neural Bug Detectors , 2022, 2022 IEEE Conference on Software Testing, Verification and Validation (ICST).

[4]  Yves Le Traon,et al.  Mutation Testing in Evolving Systems: Studying the Relevance of Mutants to Code Evolution , 2021, ACM Trans. Softw. Eng. Methodol..

[5]  Yves Le Traon,et al.  Cerebro: Static Subsuming Mutant Selection , 2021, IEEE Transactions on Software Engineering.

[6]  Michael Pradel,et al.  Semantic bug seeding: a learning-based approach for creating realistic bugs , 2021, ESEC/SIGSOFT FSE.

[7]  Yves Le Traon,et al.  Learning from what we know: How to perform vulnerability prediction using noisy historical data , 2020, Empirical Software Engineering.

[8]  Yves Le Traon,et al.  iBiR: Bug-report-driven Fault Injection , 2020, ACM Trans. Softw. Eng. Methodol..

[9]  Moritz Beller,et al.  What It Would Take to Use Mutation Testing in Industry—A Study at Facebook , 2020, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[10]  Xia Li,et al.  Can automated program repair refine fault localization? a unified debugging approach , 2020, ISSTA.

[11]  G. Bavota,et al.  DeepMutation , 2020, Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings.

[12]  Ting Liu,et al.  CodeBERT: A Pre-Trained Model for Programming and Natural Languages , 2020, FINDINGS.

[13]  Gabriele Bavota,et al.  DeepMutation: A Neural Mutation Tool , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[14]  Gabriele Bavota,et al.  Learning How to Mutate Source Code from Bug-Fixes , 2018, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[15]  Gabriele Bavota,et al.  An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation , 2018, ACM Trans. Softw. Eng. Methodol..

[16]  Yves Le Traon,et al.  How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults , 2018, Empirical Software Engineering.

[17]  Yves Le Traon,et al.  Mutant Quality Indicators , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[18]  Koushik Sen,et al.  Selecting fault revealing mutants , 2018, Empirical Software Engineering.

[19]  Thomas W. Reps,et al.  The care and feeding of wild-caught mutants , 2017, ESEC/SIGSOFT FSE.

[20]  Yves Le Traon,et al.  An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[21]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[22]  A. Jefferson Offutt,et al.  Analyzing the validity of selective mutation with dominator mutants , 2016, SIGSOFT FSE.

[23]  Mike Papadakis,et al.  Analysing and Comparing the Effectiveness of Mutation Testing Tools: A Manual Study , 2016, 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[24]  Anthony Ventresque,et al.  Demo: PIT a Practical Mutation Testing Tool for Java , 2016 .

[25]  Yves Le Traon,et al.  Threats to the validity of mutation-based test assessment , 2016, ISSTA.

[26]  Yves Le Traon,et al.  Assessing and Improving the Mutation Testing Practice of PIT , 2016, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[27]  Alex Groce,et al.  Mutations: How Close are they to Real Faults? , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[30]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[31]  A. Jefferson Offutt,et al.  Establishing Theoretical Minimal Sets of Mutants , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[32]  A. Jefferson Offutt,et al.  Mutant Subsumption Graphs , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.

[33]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[34]  Mike Papadakis,et al.  Evaluating Mutation Testing Alternatives: A Collateral Experiment , 2010, 2010 Asia Pacific Software Engineering Conference.

[35]  Mark Harman,et al.  Higher Order Mutation Testing , 2009, Inf. Softw. Technol..

[36]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[37]  Luciano Baresi,et al.  An Introduction to Software Testing , 2006, FoVMT.

[38]  A. Jefferson Offutt,et al.  MuJava: an automated class mutation system , 2005, Softw. Test. Verification Reliab..

[39]  J.H. Andrews,et al.  Is mutation an appropriate tool for testing experiments? [software testing] , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[40]  A. Jefferson Offutt,et al.  Mutation 2000: uniting the orthogonal , 2001 .

[41]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[42]  Ram Chillarege,et al.  Generation of an error set that emulates software faults based on field data , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[43]  Jean Arlat,et al.  Fault Injection and Dependability Evaluation of Fault-Tolerant Systems , 1993, IEEE Trans. Computers.

[44]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[45]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[46]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[47]  A. Arabia Localization , 2021, Lecture Notes in Mathematics.

[48]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..

[49]  Domenico Cotroneo,et al.  On Fault Representativeness of Software Fault Injection , 2013, IEEE Transactions on Software Engineering.