论文信息 - A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations

A Rotation and a Translation Suffice: Fooling CNNs with Simple Transformations

Recent work has shown that neural network-based vision classifiers exhibit a significant vulnerability to misclassifications caused by imperceptible but adversarial perturbations of their inputs. These perturbations, however, are purely pixel-wise and built out of loss function gradients of either the attacked model or its surrogate. As a result, they tend to look pretty artificial and contrived. This might suggest that vulnerability to misclassification of slight input perturbations can only arise in a truly adversarial setting and thus is unlikely to be a problem in more benign contexts. In this paper, we provide evidence that such a belief might be incorrect. To this end, we show that neural networks are already vulnerable to significantly simpler - and more likely to occur naturally - transformations of the inputs. Specifically, we demonstrate that rotations and translations alone suffice to significantly degrade the classification performance of neural network-based vision models across a spectrum of datasets. This remains to be the case even when these models are trained using appropriate data augmentation and are already robust against the canonical, pixel-wise perturbations. Also, finding such "fooling" transformation does not even require having any special access to the model or its surrogate - just trying out a small number of random rotation and translation combinations already has a significant effect. These findings suggest that our current neural network-based vision models might not be as reliable as we tend to assume.

[1] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[3] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Jean-Philippe Vial,et al. Robust Optimization , 2021, ICORES.

[7] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[9] Pascal Frossard,et al. Manitest: Are classifiers really invariant? , 2015, BMVC.

[10] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[11] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ananthram Swami,et al. Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[18] Lujo Bauer,et al. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition , 2016, CCS.

[19] David A. Wagner,et al. Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[20] Patrick D. McDaniel,et al. On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[21] Ryan R. Curtin,et al. Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[22] Ananthram Swami,et al. Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[23] Dawn Xiaodong Song,et al. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong , 2017, ArXiv.

[24] Logan Engstrom,et al. Query-Efficient Black-box Adversarial Examples , 2017, ArXiv.

[25] David Wagner,et al. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[26] Dawn Xiaodong Song,et al. Exploring the Space of Black-box Attacks on Deep Neural Networks , 2017, ArXiv.

[27] John C. Duchi,et al. Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[28] Samy Bengio,et al. Adversarial examples in the physical world , 2016, ICLR.

[29] David L. Dill,et al. Ground-Truth Adversarial Examples , 2017, ArXiv.

[30] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[32] Dan Boneh,et al. Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[33] J. Zico Kolter,et al. Provable defenses against adversarial examples via the convex outer adversarial polytope , 2017, ICML.

[34] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[35] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[36] Aditi Raghunathan,et al. Certified Defenses against Adversarial Examples , 2018, ICLR.

[37] Yanjun Qi,et al. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks , 2017, NDSS.

[38] Mingyan Liu,et al. Spatially Transformed Adversarial Examples , 2018, ICLR.