Markov Security Games : Learning in Spatial Security Problems

In this paper we present a preliminary investigation of modelling spatial aspects of security games within the context of Markov games. Reinforcement learning is a powerful tool for adaptation in unknown environments, however the basic singleagent RL algorithms are unfit to be applied in adversarial scenarios. Therefore, we profit from Adversarial Multi-Armed Bandit (AMAB) methods which are designed for such situations. Based on temporal difference methods we derive two new multiagent algorithms using AMAB methods for spatial two-player non-cooperative security games.

[1]  B. Chaib-draa,et al.  Multiagent Q-Learning : Preliminary Study on Dominance between the Nash and Stackelberg Equilibriums , 2005 .

[2]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[3]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4]  Gerhard Weiss,et al.  Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[5]  Milind Tambe,et al.  When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[6]  Manish Jain,et al.  Computing optimal randomized resource allocations for massive security games , 2009, AAMAS 2009.

[7]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[8]  Peter Stone,et al.  Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[9]  Bo An,et al.  PROTECT: a deployed game theoretic system to protect the ports of the United States , 2012, AAMAS.

[10]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[11]  Viliam Lisý,et al.  Combining Online Learning and Equilibrium Computation in Security Games , 2015, GameSec.

[12]  Branislav Bosanský,et al.  Algorithms for computing strategies in two-player simultaneous move games , 2016, Artif. Intell..

[13]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[14]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[15]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[16]  D. Barrios-Aranibar,et al.  LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS , 2007 .

[17]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[18]  Gerald Tesauro,et al.  Playing repeated Stackelberg games with unknown opponents , 2012, AAMAS.

[19]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[20]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[21]  V. Kononen,et al.  Asymmetric multiagent reinforcement learning , 2003, IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003..

[22]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[23]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS 2008.

[24]  Shimon Whiteson,et al.  A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[25]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[26]  Milind Tambe,et al.  Security and Game Theory: IRIS – A Tool for Strategic Security Allocation in Transportation Networks , 2011, AAMAS 2011.