Analysing and Comparing the Effectiveness of Mutation Testing Tools: A Manual Study

Mutation testing is considered as one of the most powerful testing methods. It operates by asking testers to design tests that reveal a set of mutants, which are purpose-made injected defects. Evidently, the strength of the method strongly depends on the used mutants. However, this dependence raises concerns regarding the mutation testing practice that is implemented by existing tools. Thus, it is probable that implementation inadequacies can lead to incompetent results. In this paper, we cross-evaluate three popular mutation testing tools for Java, namely MUJAVA, MAJOR and PIT, with respect to their effectiveness. We perform an empirical study of 3,324 manually analysed mutants from real-world projects and we find that there are large differences between the tools' effectiveness, ranging from 76% to 88%, with MUJAVA achieving the best results. We also demonstrate that no tool is able to subsume the others and provide practical recommendations on how to strengthen each one of the studied tools. Finally, our analysis shows that 11%, 12% and 7% of the mutants generated by MUJAVA, MAJOR and PIT are equivalent, respectively.

[1]  Marinos Kintis,et al.  Effective methods to tackle the equivalent mutant problem when testing software with mutation , 2016 .

[2]  Mike Papadakis,et al.  Evaluating Mutation Testing Alternatives: A Collateral Experiment , 2010, 2010 Asia Pacific Software Engineering Conference.

[3]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[4]  A. Jefferson Offutt,et al.  Empirical Evaluation of the Statement Deletion Mutation Operator , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[5]  Yves Le Traon,et al.  Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[6]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[7]  András Márki,et al.  On strong mutation and subsuming mutants , 2016, 2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[8]  Yves Le Traon,et al.  Threats to the validity of mutation-based test assessment , 2016, ISSTA.

[9]  Frederick C. Harris,et al.  Estimation and Enhancement of Real-Time Software Reliability Through Mutation Analysis , 1992, IEEE Trans. Computers.

[10]  A. Jefferson Offutt,et al.  Improving logic-based testing , 2013, J. Syst. Softw..

[11]  Yves Le Traon,et al.  Assessing and Improving the Mutation Testing Practice of PIT , 2016, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[12]  Mickaël Delahaye,et al.  Selecting a software engineering tool: lessons learnt from mutation analysis , 2015, Softw. Pract. Exp..

[13]  Ibrahim Habli,et al.  An Empirical Evaluation of Mutation Testing for Improving the Test Quality of Safety-Critical Software , 2013, IEEE Transactions on Software Engineering.

[14]  Nicos Malevris,et al.  MEDIC: A static analysis framework for equivalent mutant identification , 2015, Inf. Softw. Technol..

[15]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[16]  A. Jefferson Offutt,et al.  MuJava: an automated class mutation system , 2005, Softw. Test. Verification Reliab..

[17]  A. Jefferson Offutt,et al.  A mutation carol: Past, present and future , 2011, Inf. Softw. Technol..

[18]  Mark Harman,et al.  A study of equivalent and stubborn mutation operators using human analysis of equivalence , 2014, ICSE.

[19]  Mike Papadakis,et al.  Automatic Mutation Test Case Generation via Dynamic Symbolic Execution , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[20]  A. Jefferson Offutt,et al.  Establishing Theoretical Minimal Sets of Mutants , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[21]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[22]  René Just,et al.  Higher accuracy and lower run time: efficient mutation analysis using non‐redundant mutation operators , 2015, Softw. Test. Verification Reliab..

[23]  Alex Groce,et al.  Does choice of mutation tool matter? , 2016, Software Quality Journal.

[24]  Dana Angluin,et al.  Two notions of correctness and their relation to testing , 1982, Acta Informatica.

[25]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[26]  A. Jefferson Offutt,et al.  An Experimental Comparison of Four Unit Test Criteria: Mutation, Edge-Pair, All-Uses and Prime Path Coverage , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[27]  Sunil Kumar Khatri,et al.  Experimental comparison of automated mutation testing tools for java , 2015, 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions).