论文信息 - Markov Security Games : Learning in Spatial Security Problems

Markov Security Games : Learning in Spatial Security Problems

In this paper we present a preliminary investigation of modelling spatial aspects of security games within the context of Markov games. Reinforcement learning is a powerful tool for adaptation in unknown environments, however the basic singleagent RL algorithms are unfit to be applied in adversarial scenarios. Therefore, we profit from Adversarial Multi-Armed Bandit (AMAB) methods which are designed for such situations. Based on temporal difference methods we derive two new multiagent algorithms using AMAB methods for spatial two-player non-cooperative security games.

[1] B. Chaib-draa,et al. Multiagent Q-Learning : Preliminary Study on Dominance between the Nash and Stackelberg Equilibriums , 2005 .

[2] Peter I. Cowling,et al. Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[3] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[4] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[5] Milind Tambe,et al. When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[6] Manish Jain,et al. Computing optimal randomized resource allocations for massive security games , 2009, AAMAS 2009.

[7] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .

[8] Peter Stone,et al. Leading Best-Response Strategies in Repeated Games , 2001, International Joint Conference on Artificial Intelligence.

[9] Bo An,et al. PROTECT: a deployed game theoretic system to protect the ports of the United States , 2012, AAMAS.

[10] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[11] Viliam Lisý,et al. Combining Online Learning and Equilibrium Computation in Security Games , 2015, GameSec.

[12] Branislav Bosanský,et al. Algorithms for computing strategies in two-player simultaneous move games , 2016, Artif. Intell..

[13] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[14] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[15] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .

[16] D. Barrios-Aranibar,et al. LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS , 2007 .

[17] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[18] Gerald Tesauro,et al. Playing repeated Stackelberg games with unknown opponents , 2012, AAMAS.

[19] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[20] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[21] V. Kononen,et al. Asymmetric multiagent reinforcement learning , 2003, IEEE/WIC International Conference on Intelligent Agent Technology, 2003. IAT 2003..

[22] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[23] Sarit Kraus,et al. Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS 2008.

[24] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[25] Yoav Shoham,et al. If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[26] Milind Tambe,et al. Security and Game Theory: IRIS – A Tool for Strategic Security Allocation in Transportation Networks , 2011, AAMAS 2011.