Geometric Universality of Adversarial Examples in Deep Learning

We consider the problem of adversarial examples in deep learning and attempt to provide geometric insights on their universality. Specifically, we define adversarial directions and prove relevant results towards universality of adversarial examples with few theoretical assumptions. Our results raise attention to fully-connected layers as the last layer of most neural networks, which may be prone to adversarial examples, demanding further research in this regard. A longer version with full proofs and discussions is provided with the submission email and also here. Consider the softmax regression layer at the end of many popular neural networks for visual classification tasks (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2016; He et al., 2016) and the hidden space of the input neurons to the softmax layer. Denote the hidden space of the input to softmax layerH ⊆ R, and let h ∈ H be this input vector, and l is the number of neurons in the final hidden layer. We further denotem the number of classes. We define softmax function S(z) : R 7→ R as S(z) where z is the logits. Then the overall softmax layer could be denoted S(Wh+ b). The neural network classifier first maps input images x to the hidden representation h with the complex multi-layer non-linear function g: X 7→ H, h = g(x), and then perform softmax regression to obtain a predicted label y = argmaxi∈[m] S(Wh+ b)i. We only show results with the caseH = R here. Definition 1. (Adversarial Direction) An adversarial direction is defined on any h ∈ H as a direction d such that ∀θ ∈ R, S(W (h + θd) + b) = S(Wh + b), i.e., arbitrarily traversing along d preserves the softmax output. Such directions are adversarial in that no output difference could be observed with input manipulation to any degree along them, which opens a wide range in H for potential adversarial examples. We further assume l > m, which is the case in almost all top-performing neural networks on ImageNet (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015; Szegedy et al., 2016; He et al., 2016) with l = 4096 (or l = 2048) and m = 1000 for the 1000 classes. Department of Computer Science and Technology, Tsinghua University, Beijing. Correspondence to: Haosheng Zou <zouhs16@mails.tsinghua.edu.cn>. Figure 1. Left: illustration of a decision region (purple ones) that does not extend to ∞ when l < m. We show that in popular architectures, l > m and all decision regions extend to ∞, facilitating adversarial examples. Right: “parallel” softmax layers where two of the three decision regions determined by the pair of parallel hyperplanes are not adjacent (red and blue ones), making adversarial examples between the two classes harder to find. Theorem 1. ∀h ∈ H, ∃V ⊆ R with dim(V ) ≥ l −m, s.t. ∀d ∈ V , ∀θ ∈ R, S(h+ θd) = S(h). Theorem 2. (Universality of Adversarial Directions) For any region inH that’s classified as a certain class, there always exists at least one direction, infinitely far along which the points are still classified as the same class with identical output probabilities (contrary to Fig. 1, left). For most g and softmax layers, we may even conjecture we could find adversarial examples for any data pair. Definition 2. A softmax layer is parallel if at least one pair of its decision boundries is parallel (Fig. 1, right). Conjecture. (Universality of Adversarial Examples) For most multi-layer non-linear mappings g : X 7→ H and non-“parallel” softmax layers (W, b), for any data pair (x, y) and (x′, y′) where y 6= y′, there exists an imperceptible adversarial example x∗ with ‖x∗ − x‖p ≤ for an imperceptible and g(x∗) = g(x′)+θd′, where θ ∈ R and d′ is an adversarial direction of class y′ in spaceH. Significance and Implications: We provide a deeper understanding of softmax regression widely used without question in neural networks. The decision boundary of softmax regression is piece-wise linear, and the decision region for each class is convex, which makes softmax regression simple and expectedly robust enough. However, we show that the decision regions are generally unconstrained, probably leading to the universality of adversarial examples combined with the non-linear preceding layers. Little work has been done on the final classification layer. Serving as theoretical evidence for some preliminary work already looking for substitute classification layers (Pang et al., 2018), this paper raises attention to softmax layers on adversarial robustness. Geometric Universality of Adversarial Examples in Deep Learning

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).