Are mutants really natural?: a study on how "naturalness" helps mutant selection

Background: Code is repetitive and predictable in a way that is similar to the natural language. This means that code is "natural" and this "naturalness" can be captured by natural language modelling techniques. Such models promise to capture the program semantics and identify source code parts that `smell', i.e., they are strange, badly written and are generally error-prone (likely to be defective). Aims: We investigate the use of natural language modelling techniques in mutation testing (a testing technique that uses artificial faults). We thus, seek to identify how well artificial faults simulate real ones and ultimately understand how natural the artificial faults can be. Our intuition is that natural mutants, i.e., mutants that are predictable (follow the implicit coding norms of developers), are semantically useful and generally valuable (to testers). We also expect that mutants located on unnatural code locations (which are generally linked with error-proneness) to be of higher value than those located on natural code locations. Method: Based on this idea, we propose mutant selection strategies that rank mutants according to a) their naturalness (naturalness of the mutated code), b) the naturalness of their locations (naturalness of the original program statements) and c) their impact on the naturalness of the code that they apply to (naturalness differences between original and mutated statements). We empirically evaluate these issues on a benchmark set of 5 open-source projects, involving more than 100k mutants and 230 real faults. Based on the fault set we estimate the utility (i.e. capability to reveal faults) of mutants selected on the basis of their naturalness, and compare it against the utility of randomly selected mutants. Results: Our analysis shows that there is no link between naturalness and the fault revelation utility of mutants. We also demonstrate that the naturalness-based mutant selection performs similar (slightly worse) to the random mutant selection. Conclusions: Our findings are negative but we consider them interesting as they confute a strong intuition, i.e., fault revelation is independent of the mutants' naturalness.

[1]  Fan Wu,et al.  Memory mutation testing , 2017, Inf. Softw. Technol..

[2]  Nikolai Kosmatov,et al.  Time to Clean Your Test Objectives , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  Yves Le Traon,et al.  On the Impact of Tokenizer and Parameters on N-Gram Based Code Analysis , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[4]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[5]  Akbar Siami Namin,et al.  Prioritizing Mutation Operators Based on Importance Sampling , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[6]  Yves Le Traon,et al.  Mutant Quality Indicators , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[7]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[8]  Yves Le Traon,et al.  TUNA: TUning Naturalness-Based Analysis , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[9]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[10]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[11]  Yves Le Traon,et al.  How effective are mutation testing tools? An empirical analysis of Java mutation testing tools with manual analysis and real faults , 2017, Empirical Software Engineering.

[12]  Shin Yoo,et al.  Are Mutation Scores Correlated with Real Fault Detection? A Large Scale Empirical Study on the Relationship Between Mutants and Real Faults , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[13]  John A. Clark,et al.  Semantic Mutation Testing , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[14]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[15]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[16]  Premkumar T. Devanbu,et al.  On the localness of software , 2014, SIGSOFT FSE.

[17]  Thomas W. Reps,et al.  The care and feeding of wild-caught mutants , 2017, ESEC/SIGSOFT FSE.

[18]  Gilles Perrouin,et al.  Towards Security-Aware Mutation Testing , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[19]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[20]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..

[21]  Mark Harman,et al.  Detecting Trivial Mutant Equivalences via Compiler Optimisations , 2018, IEEE Transactions on Software Engineering.

[22]  René Just,et al.  MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[23]  A. Jefferson Offutt,et al.  A semantic model of program faults , 1996, ISSTA '96.

[24]  Gabriele Bavota,et al.  Enabling mutation testing for Android apps , 2017, ESEC/SIGSOFT FSE.

[25]  Reyhaneh Jabbarvand,et al.  µDroid: an energy-aware mutation testing framework for Android , 2017, ESEC/SIGSOFT FSE.

[26]  A. Jefferson Offutt,et al.  Analyzing the validity of selective mutation with dominator mutants , 2016, SIGSOFT FSE.

[27]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[28]  Zhendong Su,et al.  A study of the uniqueness of source code , 2010, FSE '10.

[29]  Koushik Sen,et al.  Selecting fault revealing mutants , 2018, Empirical Software Engineering.

[30]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[31]  Yves Le Traon,et al.  An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[32]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[33]  Yves Le Traon,et al.  Assessing and Improving the Mutation Testing Practice of PIT , 2016, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[34]  Pankaj Sharma,et al.  MuRanker: a mutant ranking tool , 2015, Softw. Test. Verification Reliab..

[35]  Pierre-Yves Schobbens,et al.  Featured Model-Based Mutation Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[36]  Steffen Staab,et al.  A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing , 2014, ACL.

[37]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[38]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[39]  Yves Le Traon,et al.  Threats to the validity of mutation-based test assessment , 2016, ISSTA.

[40]  A. Jefferson Offutt,et al.  Towards mutation analysis of Android apps , 2015, 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[41]  Robert M. Hierons,et al.  SMT-C: A Semantic Mutation Testing Tools for C , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[42]  Fevzi Belli,et al.  Mutation Testing of "Go-Back" Functions Based on Pushdown Automata , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[43]  Premkumar T. Devanbu,et al.  On the "naturalness" of buggy code , 2015, ICSE.

[44]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.