MuDelta: Delta-Oriented Mutation Testing at Commit Time

To effectively test program changes using mutation testing, one needs to use mutants that are relevant to the altered program behaviours. In view of this, we introduce MuDelta, an approach that identifies commit-relevant mutants; mutants that affect and are affected by the changed program behaviours. Our approach uses machine learning applied on a combined scheme of graph and vector-based representations of static code features. Our results, from 50 commits in 21 Coreutils programs, demonstrate a strong prediction ability of our approach; yielding 0.80 (ROC) and 0.50 (PR Curve) AUC values with 0.63 and 0.32 precision and recall values. These predictions are significantly higher than random guesses, 0.20 (PR-Curve) AUC, 0.21 and 0.21 precision and recall, and subsequently lead to strong relevant tests that kill 45%more relevant mutants than randomly sampled mutants (either sampled from those residing on the changed component(s) or from the changed lines). Our results also show that MuDelta selects mutants with 27% higher fault revealing ability in fault introducing commits. Taken together, our results corroborate the conclusion that commit-based mutation testing is suitable and promising for evolving software.

[1]  Alessandro Orso,et al.  Test-Suite Augmentation for Evolving Software , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[2]  Matias Martinez,et al.  Fine-grained and accurate source code differencing , 2014, ASE.

[3]  Hareton K. N. Leung,et al.  A survey of code‐based change impact analysis techniques , 2013, Softw. Test. Verification Reliab..

[4]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[5]  Alessandro Orso,et al.  MATRIX: Maintenance-Oriented Testing Requirements Identifier and Examiner , 2006, Testing: Academic & Industrial Conference - Practice And Research Techniques (TAIC PART'06).

[6]  Mark Harman,et al.  Detecting Trivial Mutant Equivalences via Compiler Optimisations , 2018, IEEE Transactions on Software Engineering.

[7]  Cristian Cadar,et al.  KATCH: high-coverage testing of software patches , 2013, ESEC/FSE 2013.

[8]  Yves Le Traon,et al.  Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[9]  Raúl A. Santelices,et al.  Exploiting program dependencies for scalable multiple-path symbolic execution , 2010, ISSTA '10.

[10]  Mikko Kivelä,et al.  Generalizations of the clustering coefficient to weighted complex networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  G. Fagiolo Clustering in complex directed networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Alex Groce,et al.  Mutation Reduction Strategies Considered Harmful , 2017, IEEE Transactions on Reliability.

[13]  J. Friedman Stochastic gradient boosting , 2002 .

[14]  Zhenkai Liang,et al.  Test generation to expose changes in evolving programs , 2010, ASE '10.

[15]  Balaji Varanasi,et al.  Continuous Integration , 2019, Introducing Maven.

[16]  Yves Le Traon,et al.  Mutant Quality Indicators , 2018, 2018 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW).

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  Lu Zhang,et al.  Predictive Mutation Testing , 2016, IEEE Transactions on Software Engineering.

[19]  Wei Li,et al.  DeepFL: integrating multiple fault diagnosis dimensions for deep fault localization , 2019, ISSTA.

[20]  Dana Angluin,et al.  Two notions of correctness and their relation to testing , 1982, Acta Informatica.

[21]  Yves Le Traon,et al.  Killing Stubborn Mutants with Symbolic Execution , 2020, ArXiv.

[22]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..

[23]  Laurie A. Williams,et al.  On guiding the augmentation of an automated test suite via mutation analysis , 2009, Empirical Software Engineering.

[24]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[25]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[26]  Yves Le Traon,et al.  Assessing Transition-Based Test Selection Algorithms at Google , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[27]  Anthony Ventresque,et al.  Demo: PIT a Practical Mutation Testing Tool for Java , 2016 .

[28]  Yves Le Traon,et al.  Threats to the validity of mutation-based test assessment , 2016, ISSTA.

[29]  Mark Harman,et al.  ORBS: language-independent program slicing , 2014, SIGSOFT FSE.

[30]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[31]  Abhik Roychoudhury,et al.  CoREBench: studying complexity of regression errors , 2014, ISSTA 2014.

[32]  Koushik Sen,et al.  Selecting fault revealing mutants , 2018, Empirical Software Engineering.

[33]  Thierry Titcheu Chekam,et al.  Commit-Aware Mutation Testing , 2020, 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[34]  Matthew B. Dwyer,et al.  Differential symbolic execution , 2008, SIGSOFT '08/FSE-16.

[35]  Yves Le Traon,et al.  Assessing and Improving the Mutation Testing Practice of PIT , 2016, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[36]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[37]  Raúl A. Santelices,et al.  Applying aggressive propagation-based strategies for testing changes , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[38]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[39]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2010, IEEE Transactions on Software Engineering.

[40]  Yves Le Traon,et al.  Mart: a mutant generation tool for LLVM , 2019, ESEC/SIGSOFT FSE.

[41]  Mark Harman,et al.  Empirical study of optimization techniques for massive slicing , 2007, ACM Trans. Program. Lang. Syst..

[42]  Lingming Zhang,et al.  An Extensive Study on Cross-Project Predictive Mutation Testing , 2019, 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST).

[43]  A. Jefferson Offutt,et al.  Analyzing the validity of selective mutation with dominator mutants , 2016, SIGSOFT FSE.

[44]  Cristian Cadar,et al.  Shadow Symbolic Execution for Testing Software Patches , 2018, ACM Trans. Softw. Eng. Methodol..

[45]  Mark Harman,et al.  Locating dependence clusters and dependence pollution , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[46]  Mike Papadakis,et al.  Employing second‐order mutation for isolating first‐order equivalent mutants , 2015, Softw. Test. Verification Reliab..

[47]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[48]  Goran Petrovic,et al.  State of Mutation Testing at Google , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[49]  Marta C. González,et al.  Cycles and clustering in bipartite networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[50]  Alessandro Orso,et al.  Incremental slicing based on data-dependences types , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[51]  Akbar Siami Namin,et al.  Sufficient mutation operators for measuring test effectiveness , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[52]  Luciano da Fontoura Costa,et al.  Rich-club phenomenon across complex network hierarchies , 2007 .

[53]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[54]  Yves Le Traon,et al.  An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[55]  A. Jefferson Offutt,et al.  Establishing Theoretical Minimal Sets of Mutants , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[56]  Peter W. O'Hearn,et al.  From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[57]  Yves Le Traon,et al.  Comparing White-Box and Black-Box Test Prioritization , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[58]  Sarfraz Khurshid,et al.  Regression mutation testing , 2012, ISSTA 2012.

[59]  K. Kaski,et al.  Intensity and coherence of motifs in weighted complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[60]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[61]  A. Jefferson Offutt,et al.  Mutant Subsumption Graphs , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.

[62]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[63]  Myra B. Cohen,et al.  Directed test suite augmentation: techniques and tradeoffs , 2010, FSE '10.