Reinforcement Learning under Threats

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. In this paper, we introduce Threatened Markov Decision Processes (TMDPs), which provide a framework to support a decision maker against a potential adversary in RL. Furthermore, we propose a level-$k$ thinking scheme resulting in a new learning framework to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries while the agent learns.

[1]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[2]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[3]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[4]  A. Raftery A model for high-order Markov chains , 1985 .

[5]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[6]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[7]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[8]  D. Stahl,et al.  Experimental evidence on players' models of other players , 1994 .

[9]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[10]  David Banks,et al.  Adversarial Risk Analysis , 2015, IWSPA@CODASPY.

[11]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[12]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[13]  David L. Banks,et al.  Modeling Opponents in Adversarial Risk Analysis , 2016, Risk analysis : an official publication of the Society for Risk Analysis.

[14]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[15]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[16]  R. Bellman,et al.  Dynamic Programming and Markov Processes , 1960 .

[17]  Marcello Restelli,et al.  Configurable Markov Decision Processes , 2018, ICML.

[18]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[21]  D. Stahl,et al.  On Players' Models of Other Players: Theory and Experimental Evidence , 1995 .

[22]  W. Press,et al.  Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent , 2012, Proceedings of the National Academy of Sciences.

[23]  E. Altman Constrained Markov Decision Processes , 1999 .

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.