Predictive Mutation Analysis via Natural Language Channel in Source Code

Mutation analysis can provide valuable insights into both System Under Test (SUT) and its test suite. However, it is not scalable due to the cost of building and testing a large number of mutants. Predictive Mutation Testing (PMT) has been proposed to reduce the cost of mutation testing, but it can only provide statistical inference about whether a mutant will be killed or not by the entire test suite. We propose Seshat, a Predictive Mutation Analysis (PMA) technique that can accurately predict the entire kill matrix, not just the mutation score of the given test suite. Seshat exploits the natural language channel in code, and learns the relationship between the syntactic and semantic concepts of each test case and the mutants it can kill, from a given kill matrix. The learnt model can later be used to predict the kill matrices for subsequent versions of the program, even after both the source and test code have changed significantly. Empirical evaluation using the programs in the Defects4J shows that Seshat can predict kill matrices with the average F-score of 0.83 for versions that are up to years apart. This is an improvement of F-score by 0.14 and 0.45 point over the state-of-the-art predictive mutation testing technique, and a simple coverage based heuristic, respectively. Seshat also performs as well as PMT for the prediction of the mutation score only. Once Seshat trains its model using a concrete mutation analysis, the subsequent predictions made by Seshat are on average 39 times faster than actual test-based analysis.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Westley Weimer,et al.  Leveraging program equivalence for adaptive program repair: Models and first results , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  I. Comparison Faster Mutation Testing Inspired by Test Prioritization and Reduction , 2013 .

[4]  RothermelGregg,et al.  An empirical study of regression test application frequency , 2005 .

[5]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[6]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[7]  Gregg Rothermel,et al.  An empirical study of regression test application frequency , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[8]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[9]  Mike Papadakis,et al.  Automatic Mutation Test Case Generation via Dynamic Symbolic Execution , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[10]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[11]  A. Jefferson Offutt,et al.  An Empirical Evaluation of Weak Mutation , 1994, IEEE Trans. Software Eng..

[12]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[13]  Anthony Ventresque,et al.  Demo: PIT a Practical Mutation Testing Tool for Java , 2016 .

[14]  Fabiano Cutigi Ferrari,et al.  A systematic literature review of techniques and metrics to reduce the cost of mutation testing , 2019, J. Syst. Softw..

[15]  Alessandra Gorla,et al.  Translating code comments to procedure specifications , 2018, ISSTA.

[17]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[18]  Premkumar T. Devanbu,et al.  On the "naturalness" of buggy code , 2015, ICSE.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Andreas Zeller,et al.  Checked coverage: an indicator for oracle quality , 2013, Softw. Test. Verification Reliab..

[21]  A. Jefferson Offutt,et al.  Mutation analysis using mutant schemata , 1993, ISSTA '93.

[22]  Bo Wang,et al.  Faster mutation analysis via equivalence modulo states , 2017, ISSTA.

[23]  Shin Hong,et al.  Mutation-Based Fault Localization for Real-World Multilingual Programs (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[25]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[26]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[27]  Mark Harman,et al.  Strong higher order mutation-based test data generation , 2011, ESEC/FSE '11.

[28]  William E. Howden,et al.  Weak Mutation Testing and Completeness of Test Sets , 1982, IEEE Transactions on Software Engineering.

[29]  Shuohang Wang,et al.  A Compare-Aggregate Model for Matching Text Sequences , 2016, ICLR.

[30]  Lingming Zhang,et al.  Practical program repair via bytecode mutation , 2018, ISSTA.

[31]  Akbar Siami Namin,et al.  Sufficient mutation operators for measuring test effectiveness , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[32]  Mark Harman,et al.  Regression Testing Minimisation, Selection and Prioritisation - A Survey , 2009 .

[33]  Yucheng Zhang,et al.  Assertions are strongly correlated with test suite effectiveness , 2015, ESEC/SIGSOFT FSE.

[34]  Jeffrey M. Voas,et al.  PIE: A Dynamic Failure-Based Technique , 1992, IEEE Trans. Software Eng..

[35]  Prem Devanbu,et al.  A Theory of Dual Channel Constraints , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[36]  Michael Pradel,et al.  NL2Type: Inferring JavaScript Function Types from Natural Language Information , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[37]  Andrea Janes,et al.  Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[38]  Shin Hong,et al.  Invasive Software Testing: Mutating Target Programs to Diversify Test Exploration for High Test Coverage , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[39]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[40]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[41]  Michael Hucka,et al.  Spiral: splitters for identifiers in source code files , 2018, J. Open Source Softw..

[42]  Alex Groce,et al.  How hard does mutation analysis have to be, anyway? , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[43]  Lingming Zhang,et al.  Speeding up Mutation Testing via Regression Test Selection: An Extensive Study , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[44]  David Lo,et al.  CC2Vec: Distributed Representations of Code Changes , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[45]  Alex Groce,et al.  Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[46]  W. Eric Wong,et al.  Combining mutation and fault localization for automated program debugging , 2014, J. Syst. Softw..

[47]  Sarfraz Khurshid,et al.  Regression mutation testing , 2012, ISSTA 2012.

[48]  Gordon Fraser,et al.  Generating unit tests with descriptive names or: would you name your children thing1 and thing2? , 2017, ISSTA.

[49]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[50]  W. Eric Wong,et al.  Using Mutation to Automatically Suggest Fixes for Faulty Programs , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[51]  Premkumar T. Devanbu,et al.  Are deep neural networks the best choice for modeling source code? , 2017, ESEC/SIGSOFT FSE.

[52]  Lingming Zhang,et al.  An Extensive Study on Cross-Project Predictive Mutation Testing , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[53]  Lu Zhang,et al.  Predictive Mutation Testing , 2016, IEEE Transactions on Software Engineering.

[54]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..