Targeted Attacks on Deep Reinforcement Learning Agents through Adversarial Observations

This paper deals with adversarial attacks on perceptions of neural network policies in the Reinforcement Learning (RL) context. While previous approaches perform untargeted attacks on the state of the agent, we propose a method to perform targeted attacks to lure an agent into consistently following a desired policy. We place ourselves in a realistic setting, where attacks are performed on observations of the environment rather than the internal state of the agent and develop constant attacks instead of per-observation ones. We illustrate our method by attacking deep RL agents playing Atari games and show that universal additive masks can be applied not only to degrade performance but to take control of an agent.

[1]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[3]  Pushmeet Kohli,et al.  Uncovering Surprising Behaviors in Reinforcement Learning via Worst-case Analysis , 2018 .

[4]  Marc G. Bellemare,et al.  Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[5]  David A. Wagner,et al.  Defensive Distillation is Not Robust to Adversarial Examples , 2016, ArXiv.

[6]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7]  Girish Chowdhary,et al.  Robust Deep Reinforcement Learning with Adversarial Attacks , 2017, AAMAS.

[8]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Jun Wang,et al.  Learning to Design Games: Strategic Environments in Deep Reinforcement Learning , 2017, IJCAI.

[11]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[12]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[13]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[14]  Marc G. Bellemare,et al.  An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.

[15]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[16]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[17]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[18]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Logan Engstrom,et al.  Synthesizing Robust Adversarial Examples , 2017, ICML.

[21]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[22]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[23]  Dawn Xiaodong Song,et al.  Delving into adversarial attacks on deep policies , 2017, ICLR.

[24]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.