DeepCrime: mutation testing of deep learning systems based on real faults

Deep Learning (DL) solutions are increasingly adopted, but how to test them remains a major open research problem. Existing and new testing techniques have been proposed for and adapted to DL systems, including mutation testing. However, no approach has investigated the possibility to simulate the effects of real DL faults by means of mutation operators. We have defined 35 DL mutation operators relying on 3 empirical studies about real faults in DL systems. We followed a systematic process to extract the mutation operators from the existing fault taxonomies, with a formal phase of conflict resolution in case of disagreement. We have implemented 24 of these DL mutation operators into DeepCrime, the first source-level pre-training mutation tool based on real DL faults. We have assessed our mutation operators to understand their characteristics: whether they produce interesting, i.e., killable but not trivial, mutations. Then, we have compared the sensitivity of our tool to the changes in the quality of test data with that of DeepMutation++, an existing post-training DL mutation tool.

[1]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[2]  Jun Wan,et al.  MuNN: Mutation Analysis of Neural Networks , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[3]  Paolo Tonella,et al.  An Empirical Evaluation of Mutation Operators for Deep Learning Systems , 2020, 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST).

[4]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[5]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[6]  Marcelo d'Amorim,et al.  Optimizing Mutation Testing by Discovering Dynamic Mutant Subsumption Relations , 2020, 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST).

[7]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[8]  Lei Ma,et al.  A Search-Based Testing Framework for Deep Neural Networks of Source Code Embedding , 2021, 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST).

[9]  Peter Robinson,et al.  Learning an appearance-based gaze estimator from one million synthesised images , 2016, ETRA.

[10]  A. Jefferson Offutt,et al.  Mutant Subsumption Graphs , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation Workshops.

[11]  Boris Beizer,et al.  Software System Testing and Quality Assurance , 1984 .

[12]  Ken Kelley,et al.  On effect size. , 2012, Psychological methods.

[13]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[14]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[15]  Sanjai Rayadurgam,et al.  Input Prioritization for Testing Neural Networks , 2019, 2019 IEEE International Conference On Artificial Intelligence Testing (AITest).

[16]  Search Based Repair of Deep Neural Networks , 2019, ArXiv.

[17]  Hridesh Rajan,et al.  A comprehensive study on deep learning bug characteristics , 2019, ESEC/SIGSOFT FSE.

[18]  Paolo Tonella,et al.  Quality Metrics and Oracles for Autonomous Vehicles Testing , 2021, 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST).

[19]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[20]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[21]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Lei Ma,et al.  DeepMutation++: A Mutation Testing Framework for Deep Learning Systems , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  A. Jefferson Offutt,et al.  Analyzing the validity of selective mutation with dominator mutants , 2016, SIGSOFT FSE.

[24]  Gabriele Bavota,et al.  Taxonomy of Real Faults in Deep Learning Systems , 2019, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[25]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[26]  Xuyuan Dong,et al.  Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE).

[27]  Jingyi Wang,et al.  Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).