Adversarial attack and defense in reinforcement learning-from AI security view

Reinforcement learning is a core technology for modern artificial intelligence, and it has become a workhorse for AI applications ranging from Atrai Game to Connected and Automated Vehicle System (CAV). Therefore, a reliable RL system is the foundation for the security critical applications in AI, which has attracted a concern that is more critical than ever. However, recent studies discover that the interesting attack mode adversarial attack also be effective when targeting neural network policies in the context of reinforcement learning, which has inspired innovative researches in this direction. Hence, in this paper, we give the very first attempt to conduct a comprehensive survey on adversarial attacks in reinforcement learning under AI security. Moreover, we give briefly introduction on the most representative defense technologies against existing adversarial attacks.

[1]  Giorgos B. Stamou,et al.  Improving Fuel Economy with LSTM Networks and Reinforcement Learning , 2018, ICANN.

[2]  Frank Swiderski,et al.  Threat Modeling , 2018, Hacking Connected Cars.

[3]  Jingjing Liu,et al.  Adversarial Examples Construction Towards White-Box Q Table Variation in DQN Pathfinding Training , 2018, 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC).

[4]  Moustapha Cissé,et al.  Countering Adversarial Images using Input Transformations , 2018, ICLR.

[5]  Arslan Munir,et al.  Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[6]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[7]  Kamyar Azizzadenesheli,et al.  Stochastic Activation Pruning for Robust Adversarial Defense , 2018, ICLR.

[8]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[9]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[10]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[11]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[12]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[13]  Pascal Frossard,et al.  Analysis of universal adversarial perturbations , 2017, ArXiv.

[14]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[16]  Yang Song,et al.  Improving the Robustness of Deep Neural Networks via Stability Training , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[18]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Tao Yang,et al.  A soft artificial muscle driven robot with reinforcement learning , 2018, Scientific Reports.

[21]  Amir Massoud Farahmand,et al.  Action-Gap Phenomenon in Reinforcement Learning , 2011, NIPS.

[22]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[26]  A. Maharaj Improving the adversarial robustness of ConvNets by reduction of input dimensionality , 2016 .

[27]  Harris Drucker,et al.  Improving generalization performance using double backpropagation , 1992, IEEE Trans. Neural Networks.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[30]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[31]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[32]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[33]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[34]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[35]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[36]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[37]  Marlos C. Machado,et al.  State of the Art Control of Atari Games Using Shallow Reinforcement Learning , 2015, AAMAS.

[38]  Yiheng Feng,et al.  Exposing Congestion Attack on Emerging Connected Vehicle based Traffic Signal Control , 2018, NDSS.

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[40]  Vishal Agarwal,et al.  Unsupervised Representation Learning of DNA Sequences , 2019, ArXiv.

[41]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[42]  Tao Xie,et al.  MULDEF: Multi-model-based Defense Against Adversarial Examples for Neural Networks , 2018, ArXiv.

[43]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[45]  Michael E. Houle,et al.  Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.

[46]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[47]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[48]  Jinfeng Yi,et al.  Defend Deep Neural Networks Against Adversarial Examples via Fixed andDynamic Quantized Activation Functions , 2018, ArXiv.

[49]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[50]  Andrew Slavin Ross,et al.  Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients , 2017, AAAI.

[51]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[52]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[53]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[54]  Yang Song,et al.  PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples , 2017, ICLR.

[55]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[56]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[57]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[58]  Alan L. Yuille,et al.  Adversarial Examples for Semantic Segmentation and Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[60]  Alan L. Yuille,et al.  Mitigating adversarial effects through randomization , 2017, ICLR.

[61]  Saibal Mukhopadhyay,et al.  Cascade Adversarial Machine Learning Regularized with a Unified Embedding , 2017, ICLR.

[62]  Angel Martínez-Tenor,et al.  Teaching machine learning in robotics interactively: the case of reinforcement learning with Lego® Mindstorms , 2018, Interact. Learn. Environ..

[63]  John C. Duchi,et al.  Certifiable Distributional Robustness with Principled Adversarial Training , 2017, ArXiv.

[64]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[65]  Jiqiang Liu,et al.  Gradient Band-based Adversarial Training for Generalized Attack Immunity of A3C Path Finding , 2018, ArXiv.

[66]  Xiaolin Hu,et al.  Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[67]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[68]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[69]  Jiqiang Liu,et al.  A PCA-Based Model to Predict Adversarial Examples on Q-Learning of Path Finding , 2018, 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC).

[70]  Jingjing Liu,et al.  A Method to Effectively Detect Vulnerabilities on Path Planning of VIN , 2017, ICICS.

[71]  Yiwen Guo,et al.  DeepDefense: Training Deep Neural Networks with Improved Robustness , 2018, ArXiv.

[72]  Mohan M. Trivedi,et al.  Looking at Humans in the Age of Self-Driving and Highly Automated Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[73]  Ding Zhao,et al.  Towards secure and safe appified automated vehicles , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[74]  Chao Lu,et al.  Load Shedding Scheme with Deep Reinforcement Learning to Improve Short-term Voltage Stability , 2018, 2018 IEEE Innovative Smart Grid Technologies - Asia (ISGT Asia).