Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.

[1]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[2]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[3]  Shipra Agrawal,et al.  Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.

[4]  Amin Karbasi,et al.  On Actively Teaching the Crowd to Classify , 2013, NIPS 2013.

[5]  Pedram Daee,et al.  Machine Teaching of Active Sequential Learners , 2019, NeurIPS.

[6]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[7]  Yevgeniy Vorobeychik,et al.  Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[8]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9]  Percy Liang,et al.  Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Volkan Cevher,et al.  Interactive Teaching Algorithms for Inverse Reinforcement Learning , 2019, IJCAI.

[12]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[13]  Felipe Leno da Silva,et al.  A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[14]  Ness Shroff,et al.  Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[15]  Manuel Lopes,et al.  Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.

[16]  Andreas Krause,et al.  Near-Optimally Teaching the Crowd to Classify , 2014, ICML.

[17]  Yishay Mansour,et al.  Experts in a Markov Decision Process , 2004, NIPS.

[18]  Xiaojin Zhu,et al.  Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models , 2019, NeurIPS.

[19]  Lihong Li,et al.  Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[22]  Vern Paxson,et al.  What's Clicking What? Techniques and Innovations of Today's Clickbots , 2011, DIMVA.

[23]  Thomas J. Walsh,et al.  Dynamic Teaching in Sequential Decision Making Environments , 2012, UAI.

[24]  Sebastian Tschiatschek,et al.  Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints , 2019, NeurIPS.

[25]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[26]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[27]  Prasad Tadepalli,et al.  H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[28]  Pietro Perona,et al.  Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners , 2018, NeurIPS.

[29]  Quanyan Zhu,et al.  Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals , 2019, GameSec.

[30]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[31]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[32]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[33]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[34]  Manuela M. Veloso,et al.  Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[35]  Rómer Rosales,et al.  Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..

[36]  Scott Niekum,et al.  Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications , 2018, AAAI.

[37]  Peter Auer,et al.  Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[38]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[39]  David C. Parkes,et al.  Value-Based Policy Teaching with Active Indirect Elicitation , 2008, AAAI.

[40]  Seong Joon Oh,et al.  Sequential Attacks on Agents for Long-Term Adversarial Goals , 2018, ArXiv.

[41]  Sebastian Tschiatschek,et al.  Teaching Inverse Reinforcement Learners via Features and Demonstrations , 2018, NeurIPS.

[42]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[43]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[44]  Yuxin Chen,et al.  Understanding the Power and Limitations of Teaching with Imperfect Knowledge , 2020, IJCAI.

[45]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[46]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[47]  Xiaojin Zhu,et al.  Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[48]  Xiaojin Zhu,et al.  Policy Poisoning in Batch Reinforcement Learning and Control , 2019, NeurIPS.

[49]  Xiaojin Zhu,et al.  An Optimal Control View of Adversarial Machine Learning , 2018, ArXiv.

[50]  Sandra Zilles,et al.  An Overview of Machine Teaching , 2018, ArXiv.

[51]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[52]  Lihong Li,et al.  Data Poisoning Attacks in Contextual Bandits , 2018, GameSec.

[53]  David C. Parkes,et al.  Policy teaching through reward function learning , 2009, EC '09.

[54]  Meikang Qiu,et al.  Reinforcement Learning for Cyber-Physical Systems , 2019, 2019 IEEE International Conference on Industrial Internet (ICII).