Scalable Inference of Symbolic Adversarial Examples

We present a novel method for generating symbolic adversarial examples: input regions guaranteed to only contain adversarial examples for the given neural network. These regions can generate real-world adversarial examples as they summarize trillions of adversarial examples. We theoretically show that computing optimal symbolic adversarial examples is computationally expensive. We present a method for approximating optimal examples in a scalable manner. Our method first selectively uses adversarial attacks to generate a candidate region and then prunes this region with hyperplanes that fit points obtained via specialized sampling. It iterates until arriving at a symbolic adversarial example for which it can prove, via state-of-the-art convex relaxation techniques, that the region only contains adversarial examples. Our experimental results demonstrate that our method is practically effective: it only needs a few thousand attacks to infer symbolic summaries guaranteed to contain $\approx 10^{258}$ adversarial examples.

[1]  Mingyan Liu,et al.  Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[2]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[3]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[4]  Martin Wattenberg,et al.  Adversarial Spheres , 2018, ICLR.

[5]  Carola-Bibiane Schönlieb,et al.  On the Connection Between Adversarial Robustness and Saliency Map Interpretability , 2019, ICML.

[6]  Arnold Smeulders,et al.  Explaining with Counter Visual Attributes and Examples , 2020, ICMR.

[7]  Nicolas Flammarion,et al.  Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[8]  Yizheng Chen,et al.  Enhancing Gradient-based Attacks with Symbolic Intervals , 2019, ArXiv.

[9]  Russ Tedrake,et al.  Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[10]  Aleksander Madry,et al.  On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[11]  Rishabh Singh,et al.  Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections , 2018, NeurIPS.

[12]  Timon Gehr,et al.  An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[13]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[14]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[15]  Po-Sen Huang,et al.  An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[16]  Matthias Hein,et al.  Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[17]  Divya Gopinath,et al.  Property Inference for Deep Neural Networks , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Dylan Hadfield-Menell,et al.  On the Geometry of Adversarial Examples , 2018, ArXiv.

[19]  Liqian Chen,et al.  Input Validation for Neural Networks via Runtime Local Robustness Verification , 2020, ArXiv.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Christof Löding,et al.  ICE: A Robust Framework for Learning Invariants , 2014, CAV.

[22]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Rishabh Singh,et al.  Scaling Symbolic Methods using Gradients for Neural Model Explanation , 2020, ICLR.

[25]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[26]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[27]  Pradeep Ravikumar,et al.  Evaluations and Methods for Explanation through Robustness Analysis , 2019, ICLR.

[28]  Richard M. Murray,et al.  Inverse Abstraction of Neural Networks Using Symbolic Interpolation , 2019, AAAI.

[29]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[30]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31]  David A. Forsyth,et al.  SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Ian J. Goodfellow,et al.  Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .