Quantifying Adversarial Sensitivity of a Model as a Function of the Image Distribution

In this paper, we propose an adaptation to the area under the curve (AUC) metric 1 to measure the adversarial robustness of a model over a particular -interval [ 0, 1] 2 (interval of adversarial perturbation strengths) that facilitates comparisons across 3 models when they have different initial 0 performance. This can be used to 4 determine how adversarially sensitive a model is to different image distributions; 5 and/or to measure how robust a model is comparatively to other models for the same 6 distribution. We used this adversarial robustness metric on MNIST, CIFAR-10, 7 and a Fusion dataset (CIFAR-10 + MNIST) where trained models performed either 8 a digit or object recognition task using a LeNet, ResNet50, or a fully connected 9 network (FullyConnectedNet) architecture and found the following: 1) CIFAR10 10 models are more adversarially sensitive than MNIST models; 2) Pretraining 11 with another image distribution sometimes carries over the adversarial sensitivity 12 induced from the image distribution – contingent on the pretrained image manifold; 13 3) Increasing the complexity of the image manifold increases the adversarial 14 sensitivity of a model trained on that image manifold, but also shows that the task 15 plays a role on the sensitivity. Collectively, our results imply non-trivial differences 16 of the learned representation space of one perceptual system over another given its 17 exposure to different image statistics (mainly objects vs digits). Moreover, these 18 results hold even when model systems are equalized to have the same level of 19 performance, or when exposed to matched image statistics of fusion images but 20 with different tasks. 21

[1]  Behnam Neyshabur,et al.  Towards Learning Convolutions from Scratch , 2020, NeurIPS.

[2]  Michael C. Frank,et al.  Unsupervised neural network models of the ventral visual stream , 2020, Proceedings of the National Academy of Sciences.

[3]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.