DeepMutation: Mutation Testing of Deep Learning Systems

Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.

[1]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[2]  Sebastian Thrun,et al.  Toward robotic cars , 2010, CACM.

[3]  Mark Harman,et al.  Constructing Subtle Faults Using Higher Order Mutation Testing , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[4]  Roy P. Pargas,et al.  Mutation Testing of Software Using MIMD Computer , 1992, ICPP.

[5]  Richard G. Hamlet,et al.  Testing Programs with the Aid of a Compiler , 1977, IEEE Transactions on Software Engineering.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Tao Xie,et al.  Binary mutation testing through dynamic translation , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[8]  Anna Derezinska,et al.  Improving Mutation Testing Process of Python Programs , 2015, CSOC.

[9]  Wynne Hsu,et al.  DESIGN OF MUTANT OPERATORS FOR THE C PROGRAMMING LANGUAGE , 2006 .

[10]  R.A. DeMillo,et al.  An extended overview of the Mothra software testing environment , 1988, [1988] Proceedings. Second Workshop on Software Testing, Verification, and Analysis.

[11]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[12]  Gibson Adam,et al.  Deeplearning4j: Distributed, open-source deep learning for Java and Scala on Hadoop and Spark , 2016 .

[13]  A. Jefferson Offutt,et al.  Constraint-Based Automatic Test Data Generation , 1991, IEEE Trans. Software Eng..

[14]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[15]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[16]  Tao Xie,et al.  XEMU: an efficient QEMU based binary mutation testing framework for embedded software , 2012, EMSOFT '12.

[17]  Yong Rae Kwon,et al.  MuJava: an automated class mutation system: Research Articles , 2005 .

[18]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[20]  Andreas Zeller,et al.  The Impact of Equivalent Mutants , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[21]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[22]  Jianping Wu,et al.  Mutation Testing of Protocol Messages Based on Extended TTCN-3 , 2008, 22nd International Conference on Advanced Information Networking and Applications (aina 2008).

[23]  Luca Pulina,et al.  An Abstraction-Refinement Approach to Verification of Artificial Neural Networks , 2010, CAV.

[24]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[25]  Anna Derezinska Advanced mutation operators applicable in C# programs , 2006, SET.

[26]  Deepinder Sidhu,et al.  Fault coverage of protocol test methods , 1988, IEEE INFOCOM '88,Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies. Networks: Evolution or Revolution?.

[27]  René Just,et al.  The major mutation framework: efficient and scalable mutation analysis for Java , 2014, ISSTA 2014.

[28]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[29]  K. N. King,et al.  A fortran language system for mutation‐based software testing , 1991, Softw. Pract. Exp..

[30]  Hiroyuki Sato,et al.  GRT: Program-Analysis-Guided Random Testing (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31]  Junfeng Yang,et al.  Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.

[32]  Fabiano Cutigi Ferrari,et al.  Mutation Testing for Aspect-Oriented Programs , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[33]  Vernon Rego,et al.  High Performance Software Testing on SIMD Machines , 1991, IEEE Trans. Software Eng..

[34]  Anthony Ventresque,et al.  Demo: PIT a Practical Mutation Testing Tool for Java , 2016 .

[35]  Nikhil Ketkar,et al.  Deep Learning with Python , 2017 .

[36]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[37]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[38]  Timothy Alan Budd,et al.  Mutation analysis of program test data , 1980 .

[39]  J. Offutt,et al.  Mutation testing implements grammar-based testing , 2006, Second Workshop on Mutation Analysis (Mutation 2006 - ISSRE Workshops 2006).

[40]  A. Jefferson Offutt,et al.  MuJava: an automated class mutation system , 2005, Softw. Test. Verification Reliab..

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Jian Pei,et al.  Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution , 2018, KDD.

[43]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[44]  A. Jefferson Offutt The Coupling Effect: Fact or Fiction , 1989, Symposium on Testing, Analysis, and Verification.

[45]  Jean-Marc Jézéquel,et al.  Genes and bacteria for automatic test cases optimization in the .NET environment , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[46]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[47]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[48]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[49]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[50]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[51]  Anna Derezinska Quality Assessment of Mutation Operators Dedicated for C# Programs , 2006, 2006 Sixth International Conference on Quality Software (QSIC'06).

[52]  Richard J. Lipton,et al.  The design of a prototype mutation system for program testing , 1899, AFIPS National Computer Conference.

[53]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[54]  John A. Clark,et al.  Investigating the effectiveness of object‐oriented testing strategies using the mutation method , 2001, Softw. Test. Verification Reliab..

[55]  François Chollet,et al.  Deep Learning with Python , 2017 .

[56]  Corina S. Pasareanu,et al.  DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.

[57]  Lei Ma,et al.  DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems , 2018, ArXiv.

[58]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[59]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[60]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.

[61]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[62]  Shing-Chi Cheung,et al.  Fault-based testing of database application programs with conceptual data model , 2005, Fifth International Conference on Quality Software (QSIC'05).

[63]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[64]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[65]  Daniel Kroening,et al.  Testing Deep Neural Networks , 2018, ArXiv.

[66]  Yadong Wang,et al.  Combinatorial Testing for Deep Learning Systems , 2018, ArXiv.

[67]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[68]  J Hayhurst Kelly,et al.  A Practical Tutorial on Modified Condition/Decision Coverage , 2001 .

[69]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.