论文信息 - Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g. shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.

Isaac Dunn | Tom Melham | Daniel Kroening | Hadrien Pouget | Laura Hanu

[1] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2] Surya Nepal,et al. Generating Semantic Adversarial Examples via Feature Manipulation , 2020, ArXiv.

[3] Ekin D. Cubuk,et al. A Fourier Perspective on Model Robustness in Computer Vision , 2019, NeurIPS.

[4] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[5] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[6] Matthias Bethge,et al. Generalisation in humans and deep neural networks , 2018, NeurIPS.

[7] Ryan P. Adams,et al. Motivating the Rules of the Game for Adversarial Example Research , 2018, ArXiv.

[8] Ian J. Goodfellow,et al. NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[9] Honglak Lee,et al. SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing , 2019, ECCV.

[10] Matthijs Douze,et al. Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.

[11] Justin Gilmer,et al. MNIST-C: A Robustness Benchmark for Computer Vision , 2019, ArXiv.

[12] Ghassan Al-Regib,et al. CURE-TSR: Challenging Unreal and Real Environments for Traffic Sign Recognition , 2017, ArXiv.

[13] Yang Song,et al. Constructing Unrestricted Adversarial Examples with Generative Models , 2018, NeurIPS.

[14] Rama Chellappa,et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models , 2018, ICLR.

[15] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[16] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[17] Phillip Isola,et al. On the "steerability" of generative adversarial networks , 2019, ICLR.

[18] J. Zico Kolter,et al. Fast is better than free: Revisiting adversarial training , 2020, ICLR.

[19] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[20] Radha Poovendran,et al. Semantic Adversarial Examples , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21] David A. Wagner,et al. Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[22] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[23] Isaac Dunn,et al. Adaptive Generation of Unrestricted Adversarial Inputs , 2019 .

[24] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Bo Li,et al. Big but Imperceptible Adversarial Perturbations via Semantic Manipulation , 2019, ArXiv.

[27] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.

[28] Po-Sen Huang,et al. Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[30] Nick Cammarata,et al. Zoom In: An Introduction to Circuits , 2020 .

[31] Aleksander Madry,et al. Exploring the Landscape of Spatial Robustness , 2017, ICML.

[32] Yoshua Bengio,et al. Measuring the tendency of CNNs to Learn Surface Statistical Regularities , 2017, ArXiv.

[33] Luyu Wang,et al. advertorch v0.1: An Adversarial Robustness Toolbox based on PyTorch , 2019, ArXiv.

[34] Chun-Liang Li,et al. Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer , 2018, ICLR.

[35] Sameer Singh,et al. Generating Natural Adversarial Examples , 2017, ICLR.

[36] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[37] Somesh Jha,et al. Generating Semantic Adversarial Examples with Differentiable Rendering , 2019, ArXiv.

[38] Bolei Zhou,et al. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks , 2018, ICLR.

[39] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[40] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[41] Sebastian Nowozin,et al. Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[42] Dan Hendrycks,et al. A Discussion of 'Adversarial Examples Are Not Bugs, They Are Features': Adversarial Example Researchers Need to Expand What is Meant by 'Robustness' , 2019, Distill.

[43] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[44] Jorn-Henrik Jacobsen,et al. Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations , 2020, ICML.

[45] John K. Tsotsos,et al. Elephant in the room , 2018 .

[46] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[47] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[48] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[50] John C. Duchi,et al. Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[51] Edward J. Delp,et al. Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[52] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Alexander S. Ecker,et al. Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[54] J. Zico Kolter,et al. Learning perturbation sets for robust machine learning , 2020, ICLR.

[55] Matthias Bethge,et al. Foolbox v0.8.0: A Python toolbox to benchmark the robustness of machine learning models , 2017, ArXiv.

[56] Yi Sun,et al. Testing Robustness Against Unforeseen Adversaries , 2019, ArXiv.