论文信息 - Scalable Inference of Symbolic Adversarial Examples

Scalable Inference of Symbolic Adversarial Examples

We present a novel method for generating symbolic adversarial examples: input regions guaranteed to only contain adversarial examples for the given neural network. These regions can generate real-world adversarial examples as they summarize trillions of adversarial examples. We theoretically show that computing optimal symbolic adversarial examples is computationally expensive. We present a method for approximating optimal examples in a scalable manner. Our method first selectively uses adversarial attacks to generate a candidate region and then prunes this region with hyperplanes that fit points obtained via specialized sampling. It iterates until arriving at a symbolic adversarial example for which it can prove, via state-of-the-art convex relaxation techniques, that the region only contains adversarial examples. Our experimental results demonstrate that our method is practically effective: it only needs a few thousand attacks to infer symbolic summaries guaranteed to contain $\approx 10^{258}$ adversarial examples.

Gagandeep Singh | Timon Gehr | Martin Vechev | Dimitar I. Dimitrov

[1] Mingyan Liu,et al. Generating Adversarial Examples with Adversarial Networks , 2018, IJCAI.

[2] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[3] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[4] Martin Wattenberg,et al. Adversarial Spheres , 2018, ICLR.

[5] Carola-Bibiane Schönlieb,et al. On the Connection Between Adversarial Robustness and Saliency Map Interpretability , 2019, ICML.

[6] Arnold Smeulders,et al. Explaining with Counter Visual Attributes and Examples , 2020, ICMR.

[7] Nicolas Flammarion,et al. Square Attack: a query-efficient black-box adversarial attack via random search , 2020, ECCV.

[8] Yizheng Chen,et al. Enhancing Gradient-based Attacks with Symbolic Intervals , 2019, ArXiv.

[9] Russ Tedrake,et al. Evaluating Robustness of Neural Networks with Mixed Integer Programming , 2017, ICLR.

[10] Aleksander Madry,et al. On Adaptive Attacks to Adversarial Example Defenses , 2020, NeurIPS.

[11] Rishabh Singh,et al. Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections , 2018, NeurIPS.

[12] Timon Gehr,et al. An abstract domain for certifying neural networks , 2019, Proc. ACM Program. Lang..

[13] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[14] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[15] Po-Sen Huang,et al. An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[16] Matthias Hein,et al. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack , 2019, ICML.

[17] Divya Gopinath,et al. Property Inference for Deep Neural Networks , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18] Dylan Hadfield-Menell,et al. On the Geometry of Adversarial Examples , 2018, ArXiv.

[19] Liqian Chen,et al. Input Validation for Neural Networks via Runtime Local Robustness Verification , 2020, ArXiv.

[20] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21] Christof Löding,et al. ICE: A Robust Framework for Learning Invariants , 2014, CAV.

[22] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24] Rishabh Singh,et al. Scaling Symbolic Methods using Gradients for Neural Model Explanation , 2020, ICLR.

[25] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[26] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[27] Pradeep Ravikumar,et al. Evaluations and Methods for Explanation through Robustness Analysis , 2019, ICLR.

[28] Richard M. Murray,et al. Inverse Abstraction of Neural Networks Using Symbolic Interpolation , 2019, AAAI.

[29] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[30] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31] David A. Forsyth,et al. SafetyNet: Detecting and Rejecting Adversarial Examples Robustly , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Ian J. Goodfellow,et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library , 2016 .