DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems

Deep learning (DL) defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. We have seen wide adoption of DL in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the testing adequacy of a DL system is usually measured by the accuracy of test data. Considering the limitation of accessible high quality test data, good accuracy performance on test data can hardly provide confidence to the testing adequacy and generality of DL systems. Unlike traditional software systems that have clear and controllable logic and functionality, the lack of interpretability in a DL system makes system analysis and defect detection difficult, which could potentially hinder its real-world deployment. In this paper, we propose DeepGauge, a set of multi-granularity testing criteria for DL systems, which aims at rendering a multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, and with four state-of-the-art adversarial attack techniques against DL. The potential usefulness of DeepGauge sheds light on the construction of more generic and robust DL systems.

[1]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[2]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[3]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  J Hayhurst Kelly,et al.  A Practical Tutorial on Modified Condition/Decision Coverage , 2001 .

[6]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[7]  Rajesh Subramanyan,et al.  A survey on model-based testing approaches: a systematic review , 2007, WEASELTech '07.

[8]  A. Jefferson Offutt,et al.  Book review: Introduction to Software Testing written by Paul Amman & Jeff Offutt. and published by CUP, 2008, 978-0-521-88038 322 pp., 0-471-20282-7 , 2008, SOEN.

[9]  Tony Gorschek,et al.  Searching for Cognitively Diverse Tests: Towards Universal Test Diversity Metrics , 2008, 2008 IEEE International Conference on Software Testing Verification and Validation Workshop.

[10]  Luca Pulina,et al.  An Abstraction-Refinement Approach to Verification of Artificial Neural Networks , 2010, CAV.

[11]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[12]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[13]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[14]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[16]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[17]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[18]  Benoit Baudry,et al.  The Multiple Facets of Software Diversity , 2014, ACM Comput. Surv..

[19]  Hiroyuki Sato,et al.  GRT: Program-Analysis-Guided Random Testing (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[26]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[27]  Cyrille Artho,et al.  Model-Based API Testing of Apache ZooKeeper , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[28]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[29]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[30]  Corina S. Pasareanu,et al.  DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.

[31]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[32]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[33]  Junfeng Yang,et al.  Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.

[34]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[35]  Zhendong Su,et al.  A Survey on Data-Flow Testing , 2017, ACM Comput. Surv..

[36]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[37]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[38]  Matthias Bethge,et al.  Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models , 2017, ICLR.

[39]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[40]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[41]  Sarfraz Khurshid,et al.  DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing , 2018, ArXiv.

[42]  Bo Li,et al.  Automated poisoning attacks and defenses in malware detection systems: An adversarial machine learning approach , 2017, Comput. Secur..

[43]  Medhat A. Moussa,et al.  Attacking Binarized Neural Networks , 2017, ICLR.

[44]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[45]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[46]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[47]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[48]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[49]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[50]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.

[51]  Daniel Kroening,et al.  Concolic Testing for Deep Neural Networks , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[52]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[53]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[54]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[55]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[56]  Corina S. Pasareanu,et al.  DeepSafe: A Data-Driven Approach for Assessing Robustness of Neural Networks , 2018, ATVA.

[57]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[58]  Yadong Wang,et al.  Combinatorial Testing for Deep Learning Systems , 2018, ArXiv.

[59]  Daniel Kroening,et al.  Testing Deep Neural Networks , 2018, ArXiv.

[60]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[61]  Bolei Zhou,et al.  Revisiting the Importance of Individual Units in CNNs via Ablation , 2018, ArXiv.