论文信息 - Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning - 字舞流文

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.

Xiaojin Zhu | Adish Singla | Goran Radanovic | Rati Devidze | Amin Rakhsha

[1] Ming-Yu Liu,et al. Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[2] Fabio Roli,et al. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[3] Shipra Agrawal,et al. Optimistic posterior sampling for reinforcement learning: worst-case regret bounds , 2022, NIPS.

[4] Amin Karbasi,et al. On Actively Teaching the Crowd to Classify , 2013, NIPS 2013.

[5] Pedram Daee,et al. Machine Teaching of Active Sequential Learners , 2019, NeurIPS.

[6] Michael Kearns,et al. On the complexity of teaching , 1991, COLT '91.

[7] Yevgeniy Vorobeychik,et al. Data Poisoning Attacks on Factorization-Based Collaborative Filtering , 2016, NIPS.

[8] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[9] Percy Liang,et al. Stronger data poisoning attacks break data sanitization defenses , 2018, Machine Learning.

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Volkan Cevher,et al. Interactive Teaching Algorithms for Inverse Reinforcement Learning , 2019, IJCAI.

[12] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[13] Felipe Leno da Silva,et al. A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems , 2019, J. Artif. Intell. Res..

[14] Ness Shroff,et al. Data Poisoning Attacks on Stochastic Bandits , 2019, ICML.

[15] Manuel Lopes,et al. Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.

[16] Andreas Krause,et al. Near-Optimally Teaching the Crowd to Classify , 2014, ICML.

[17] Yishay Mansour,et al. Experts in a Markov Decision Process , 2004, NIPS.

[18] Xiaojin Zhu,et al. Preference-Based Batch and Sequential Teaching: Towards a Unified View of Models , 2019, NeurIPS.

[19] Lihong Li,et al. Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.

[20] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[21] J. Doug Tygar,et al. Adversarial machine learning , 2019, AISec '11.

[22] Vern Paxson,et al. What's Clicking What? Techniques and Innovations of Today's Clickbots , 2011, DIMVA.

[23] Thomas J. Walsh,et al. Dynamic Teaching in Sequential Decision Making Environments , 2012, UAI.

[24] Sebastian Tschiatschek,et al. Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints , 2019, NeurIPS.

[25] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[26] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[27] Prasad Tadepalli,et al. H-Learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward , 1994 .

[28] Pietro Perona,et al. Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners , 2018, NeurIPS.

[29] Quanyan Zhu,et al. Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals , 2019, GameSec.

[30] Xiaojin Zhu,et al. Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[31] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[32] Anca D. Dragan,et al. Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[33] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[34] Manuela M. Veloso,et al. Interactive robot task training through dialog and demonstration , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[35] Rómer Rosales,et al. Simple and Scalable Response Prediction for Display Advertising , 2014, ACM Trans. Intell. Syst. Technol..

[36] Scott Niekum,et al. Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications , 2018, AAAI.

[37] Peter Auer,et al. Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning , 2006, NIPS.

[38] Blaine Nelson,et al. Poisoning Attacks against Support Vector Machines , 2012, ICML.

[39] David C. Parkes,et al. Value-Based Policy Teaching with Active Indirect Elicitation , 2008, AAAI.

[40] Seong Joon Oh,et al. Sequential Attacks on Agents for Long-Term Adversarial Goals , 2018, ArXiv.

[41] Sebastian Tschiatschek,et al. Teaching Inverse Reinforcement Learners via Features and Demonstrations , 2018, NeurIPS.

[42] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[43] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[44] Yuxin Chen,et al. Understanding the Power and Limitations of Teaching with Imperfect Knowledge , 2020, IJCAI.

[45] Sandy H. Huang,et al. Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[46] Claudia Eckert,et al. Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[47] Xiaojin Zhu,et al. Machine Teaching: An Inverse Problem to Machine Learning and an Approach Toward Optimal Education , 2015, AAAI.

[48] Xiaojin Zhu,et al. Policy Poisoning in Batch Reinforcement Learning and Control , 2019, NeurIPS.

[49] Xiaojin Zhu,et al. An Optimal Control View of Adversarial Machine Learning , 2018, ArXiv.

[50] Sandra Zilles,et al. An Overview of Machine Teaching , 2018, ArXiv.

[51] Paul Barford,et al. Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[52] Lihong Li,et al. Data Poisoning Attacks in Contextual Bandits , 2018, GameSec.

[53] David C. Parkes,et al. Policy teaching through reward function learning , 2009, EC '09.

[54] Meikang Qiu,et al. Reinforcement Learning for Cyber-Physical Systems , 2019, 2019 IEEE International Conference on Industrial Internet (ICII).