论文信息 - Towards Robust Image Classification Using Sequential Attention Models

Towards Robust Image Classification Using Sequential Attention Models

In this paper we propose to augment a modern neural-network architecture with an attention model inspired by human perception. Specifically, we adversarially train and analyze a neural model incorporating a human inspired, visual attention component that is guided by a recurrent top-down sequential process. Our experimental evaluation uncovers several notable findings about the robustness and behavior of this new model. First, introducing attention to the model significantly improves adversarial robustness resulting in state-of-the-art ImageNet accuracies under a wide range of random targeted attack strengths. Second, we show that by varying the number of attention steps (glances/fixations) for which the model is unrolled, we are able to make its defense capabilities stronger, even in light of stronger attacks --- resulting in a ``computational race'' between the attacker and the defender. Finally, we show that some of the adversarial examples generated by attacking our model are quite different from conventional adversarial examples --- they contain global, salient and \emph{spatially coherent} structures coming from the target class that would be recognizable even to a human, and work by distracting the attention of the model away from the main object in the original image.

[1] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[2] Harini Kannan,et al. Adversarial Logit Pairing , 2018, NIPS 2018.

[3] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[4] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Qi Zhao,et al. Foveation-based Mechanisms Alleviate Adversarial Examples , 2015, ArXiv.

[8] Logan Engstrom,et al. Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[9] Moustapha Cissé,et al. Countering Adversarial Images using Input Transformations , 2018, ICLR.

[10] Alex Mott,et al. Towards Interpretable Reinforcement Learning Using Attention Augmented Agents , 2019, NeurIPS.

[11] Sir G. Archaeopteryx. Object-based attention in the primary visual cortex of the macaque monkey , 1998 .

[12] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14] Yuxin Peng,et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16] Xuelong Li,et al. MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning , 2017, IJCAI.

[17] Pushmeet Kohli,et al. Adversarial Robustness through Local Linearization , 2019, NeurIPS.

[18] Jian Yu,et al. Attention, Please! Adversarial Defense via Attention Rectification and Preservation , 2018, ArXiv.

[19] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[20] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21] Samy Bengio,et al. Adversarial Machine Learning at Scale , 2016, ICLR.

[22] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[23] Daniel L. K. Yamins,et al. Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[24] T. Poggio,et al. Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[25] Logan Engstrom,et al. Synthesizing Robust Adversarial Examples , 2017, ICML.

[26] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27] Alex Bewley,et al. Hierarchical Attentive Recurrent Tracking , 2017, NIPS.

[28] B. Olshausen. 20 Years of Learning About Vision: Questions Answered, Questions Unanswered, and Questions Not Yet Asked , 2013 .

[29] David A. Wagner,et al. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[30] Jianfei Cai,et al. Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.

[32] Alan L. Yuille,et al. Feature Denoising for Improving Adversarial Robustness , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Lauren E. Welbourne,et al. Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes , 2017, Current Biology.

[34] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Byoung-Tak Zhang,et al. Multi-focus Attention Network for Efficient Deep Reinforcement Learning , 2017, AAAI Workshops.

[36] Hung-yi Lee,et al. Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection , 2016, INTERSPEECH.

[37] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[38] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.

[39] S. Liversedge,et al. Saccadic eye movements and cognition , 2000, Trends in Cognitive Sciences.

[40] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Jascha Sohl-Dickstein,et al. Adversarial Examples that Fool both Computer Vision and Time-Limited Humans , 2018, NeurIPS.

[43] Pushmeet Kohli,et al. Adversarial Risk and the Dangers of Evaluating Against Weak Attacks , 2018, ICML.

[44] Mo Shan,et al. A spatiotemporal model with visual attention for video classification , 2017, ArXiv.

[45] Daniel Baldauf,et al. Neural mechanisms of object-based attention , 2014 .

[46] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[47] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[48] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[49] David C. Van Essen,et al. Information Processing Strategies and Pathways in the Primate Visual System . , 1995 .

[50] Sungzoon Cho,et al. CRAM: Clued Recurrent Attention Model , 2018, ArXiv.

[51] Po-Sen Huang,et al. An Alternative Surrogate Loss for PGD-based Adversarial Testing , 2019, ArXiv.

[52] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[53] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] David A. Wagner,et al. Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[55] Yee Whye Teh,et al. Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects , 2018, NeurIPS.

[56] Po-Sen Huang,et al. Are Labels Required for Improving Adversarial Robustness? , 2019, NeurIPS.

[57] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[58] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.