Testing Deep Neural Networks

Deep neural networks (DNNs) have a wide range of applications, and software employing them must be thoroughly tested, especially in safety-critical domains. However, traditional software test coverage metrics cannot be applied directly to DNNs. In this paper, inspired by the MC/DC coverage criterion, we propose a family of four novel test criteria that are tailored to structural features of DNNs and their semantics. We validate the criteria by demonstrating that the generated test inputs guided via our proposed coverage criteria are able to capture undesired behaviours in a DNN. Test cases are generated using a symbolic approach and a gradient-based heuristic search. By comparing them with existing methods, we show that our criteria achieve a balance between their ability to find bugs (proxied using adversarial examples) and the computational cost of test case generation. Our experiments are conducted on state-of-the-art DNNs obtained using popular open source datasets, including MNIST, CIFAR-10 and ImageNet.

[1]  Daniel Kroening,et al.  DeepConcolic: Testing and Debugging Deep Neural Networks , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[2]  Arvind Satyanarayan,et al.  The Building Blocks of Interpretability , 2018 .

[3]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Insup Lee,et al.  Verisig: verifying safety properties of hybrid systems with neural network controllers , 2018, HSCC.

[5]  Sarfraz Khurshid,et al.  DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing , 2018, ArXiv.

[6]  Leonid Ryzhyk,et al.  Verifying Properties of Binarized Deep Neural Networks , 2017, AAAI.

[7]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[8]  Alessio Lomuscio,et al.  An approach to reachability analysis for feed-forward ReLU neural networks , 2017, ArXiv.

[9]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[10]  Nina Narodytska,et al.  Formal Analysis of Deep Binarized Neural Networks , 2018, IJCAI.

[11]  Weiming Xiang,et al.  Output Reachable Set Estimation and Verification for Multilayer Neural Networks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[13]  Min Wu,et al.  A Game-Based Approximate Verification of Deep Neural Networks with Provable Guarantees , 2018, Theor. Comput. Sci..

[14]  A. Jefferson Offutt,et al.  Book review: Introduction to Software Testing written by Paul Amman & Jeff Offutt. and published by CUP, 2008, 978-0-521-88038 322 pp., 0-471-20282-7 , 2008, SOEN.

[15]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[16]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[17]  J Hayhurst Kelly,et al.  A Practical Tutorial on Modified Condition/Decision Coverage , 2001 .

[18]  Daniel Kroening,et al.  Global Robustness Evaluation of Deep Neural Networks with Provable Guarantees for L0 Norm , 2018, ArXiv.

[19]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[20]  Xiaoxing Ma,et al.  Manifesting Bugs in Machine Learning Code: An Explorative Study with Mutation Testing , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[21]  Georgios Fainekos,et al.  Gray-box adversarial testing for control systems with machine learning components , 2018, HSCC.

[22]  Jun Wan,et al.  MuNN: Mutation Analysis of Neural Networks , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[23]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[24]  Tao Xie,et al.  Telemade: A Testing Framework for Learning-Based Malware Detection Systems , 2018, AAAI Workshops.

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[27]  James Kapinski,et al.  INVITED: Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[28]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[29]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[30]  Jun Sun,et al.  Detecting Adversarial Samples for Deep Neural Networks through Mutation Testing , 2018, ArXiv.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Sudipta Chattopadhyay,et al.  Automated Directed Fairness Testing , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[33]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Alberto L. Sangiovanni-Vincentelli,et al.  A Formalization of Robustness for Deep Neural Networks , 2019, ArXiv.

[37]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[38]  J. Zico Kolter,et al.  Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[40]  Alberto L. Sangiovanni-Vincentelli,et al.  Systematic Testing of Convolutional Neural Networks for Autonomous Driving , 2017, ArXiv.

[41]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[42]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.

[43]  Yadong Wang,et al.  Combinatorial Testing for Deep Learning Systems , 2018, ArXiv.

[44]  Ashish Tiwari,et al.  Output Range Analysis for Deep Feedforward Neural Networks , 2018, NFM.

[45]  Luca Pulina,et al.  An Abstraction-Refinement Approach to Verification of Artificial Neural Networks , 2010, CAV.

[46]  Jingyi Wang,et al.  Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[47]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[48]  Sanjit A. Seshia,et al.  Compositional Falsification of Cyber-Physical Systems with Machine Learning Components , 2017, NFM.

[49]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[50]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[51]  Chung-Hao Huang,et al.  Quantitative Projection Coverage for Testing ML-enabled Autonomous Systems , 2018, ATVA.

[52]  Georgios Fainekos,et al.  Simulation-based Adversarial Test Generation for Autonomous Vehicles with Machine Learning Components , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[53]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[54]  Tao Xie,et al.  Multiple-Implementation Testing of Supervised Learning Software , 2016, AAAI Workshops.

[55]  Jyotirmoy V. Deshmukh,et al.  Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems , 2018 .

[56]  Daniel Kroening,et al.  Safety and Trustworthiness of Deep Neural Networks: A Survey , 2018, ArXiv.

[57]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[58]  Xin-Hua Hu,et al.  Validating a Deep Learning Framework by Metamorphic Testing , 2017, 2017 IEEE/ACM 2nd International Workshop on Metamorphic Testing (MET).

[59]  Matthew Hill,et al.  "Boxing Clever": Practical Techniques for Gaining Insights into Training Data and Monitoring Distribution Shift , 2018, SAFECOMP Workshops.

[60]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  R. P. Jagadeesh Chandra Bose,et al.  Identifying implementation bugs in machine learning based image classifiers using metamorphic testing , 2018, ISSTA.

[62]  Sarfraz Khurshid,et al.  Symbolic Execution for Deep Neural Networks , 2018, ArXiv.

[63]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[64]  Lei Ma,et al.  DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems , 2018, ArXiv.

[65]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[66]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[67]  Rajesh Subramanyan,et al.  A survey on model-based testing approaches: a systematic review , 2007, WEASELTech '07.

[68]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[69]  Sriram Sankaranarayanan,et al.  Reachability analysis for neural feedback systems using regressive polynomial rule inference , 2019, HSCC.

[70]  Qi Zhu,et al.  Design Automation for Intelligent Automotive Systems , 2018, 2018 IEEE International Test Conference (ITC).

[71]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[72]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[73]  Pushmeet Kohli,et al.  Piecewise Linear Neural Network verification: A comparative study , 2017, ArXiv.

[74]  Chih-Hong Cheng,et al.  Maximum Resilience of Artificial Neural Networks , 2017, ATVA.

[75]  Ashish Tiwari,et al.  Output Range Analysis for Deep Neural Networks , 2017, ArXiv.

[76]  Yasser Shoukry,et al.  Formal verification of neural network controlled autonomous systems , 2018, HSCC.

[77]  Lei Ma,et al.  DeepHunter: Hunting Deep Neural Network Defects via Coverage-Guided Fuzzing , 2018, 1809.01266.

[78]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[79]  Zhendong Su,et al.  A Survey on Data-Flow Testing , 2017, ACM Comput. Surv..

[80]  Junfeng Yang,et al.  Formal Security Analysis of Neural Networks using Symbolic Intervals , 2018, USENIX Security Symposium.