Feature-Guided Black-Box Safety Testing of Deep Neural Networks

Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc.) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player's objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can con- verge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.

[1]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[2]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[3]  Sebastian Bittel,et al.  Pixel-wise Segmentation of Street with Neural Networks , 2015, ArXiv.

[4]  Corina S. Pasareanu,et al.  DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks , 2017, ArXiv.

[5]  Yarin Gal,et al.  Real Time Image Saliency for Black Box Classifiers , 2017, NIPS.

[6]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[7]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[8]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[9]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[12]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[13]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[14]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[15]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Risto Miikkulainen,et al.  Intrusion Detection with Neural Networks , 1997, NIPS.

[18]  Luca Pulina,et al.  An Abstraction-Refinement Approach to Verification of Artificial Neural Networks , 2010, CAV.

[19]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[21]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[22]  Gavin Brown,et al.  Is Deep Learning Safe for Robot Vision? Adversarial Examples Against the iCub Humanoid , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Yann LeCun,et al.  Traffic sign recognition with multi-scale Convolutional Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[24]  David A. Forsyth,et al.  NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles , 2017, ArXiv.

[25]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[26]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[27]  Nina Narodytska,et al.  Simple Black-Box Adversarial Attacks on Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Nina Narodytska,et al.  Simple Black-Box Adversarial Perturbations for Deep Networks , 2016, ArXiv.

[29]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[30]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[31]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[32]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[33]  Jack W. Stokes,et al.  Large-scale malware classification using random projections and neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.