DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems

Deep learning defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. Deep learning (DL) has been widely adopted in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the robustness of a DL system against adversarial attacks is usually measured by the accuracy of test data. Considering the limitation of accessible test data, good performance on test data can hardly guarantee the robustness and generality of DL systems. Different from traditional software systems which have clear and controllable logic and functionality, a DL system is trained with data and lacks thorough understanding. This makes it difficult for system analysis and defect detection, which could potentially hinder its real-world deployment without safety guarantees. In this paper, we propose DeepGauge, a comprehensive and multi-granularity testing criteria for DL systems, which renders a complete and multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, with four state-of-the-art adversarial data generation techniques. The effectiveness of DeepGauge sheds light on the construction of robust DL systems.

[1]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[2]  Corina S. Pasareanu,et al.  DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.

[3]  Kai Yu,et al.  Very deep convolutional neural networks for LVCSR , 2015, INTERSPEECH.

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[6]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[8]  Colin Raffel,et al.  Thermometer Encoding: One Hot Way To Resist Adversarial Examples , 2018, ICLR.

[9]  Sameer Singh,et al.  Generating Natural Adversarial Examples , 2017, ICLR.

[10]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[11]  Sarfraz Khurshid,et al.  DeepRoad: GAN-based Metamorphic Autonomous Driving System Testing , 2018, ArXiv.

[12]  Aditi Raghunathan,et al.  Certified Defenses against Adversarial Examples , 2018, ICLR.

[13]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[14]  Xianzhi Wang,et al.  THE IEEE , 2020 .

[15]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[16]  Huichen Lihuichen DECISION-BASED ADVERSARIAL ATTACKS: RELIABLE ATTACKS AGAINST BLACK-BOX MACHINE LEARNING MODELS , 2017 .

[17]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[18]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[19]  Cyrille Artho,et al.  Model-Based API Testing of Apache ZooKeeper , 2017, 2017 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[20]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[21]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[22]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[23]  Medhat A. Moussa,et al.  Attacking Binarized Neural Networks , 2017, ICLR.

[24]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[25]  Rama Chellappa,et al.  Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[26]  Luca Pulina,et al.  An Abstraction-Refinement Approach to Verification of Artificial Neural Networks , 2010, CAV.

[27]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[28]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[29]  Rajesh Subramanyan,et al.  A survey on model-based testing approaches: a systematic review , 2007, WEASELTech '07.

[30]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[31]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[32]  Junfeng Yang,et al.  Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems , 2017, ArXiv.

[33]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.

[34]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[35]  Dawn Xiaodong Song,et al.  Decision Boundary Analysis of Adversarial Examples , 2018, ICLR.

[36]  Yanjun Qi,et al.  Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.