Policy Learning for Continuous Space Security Games Using Neural Networks

A wealth of algorithms centered around (integer) linear programming have been proposed to compute equilibrium strategies in security games with discrete states and actions. However, in practice many domains possess continuous state and action spaces. In this paper, we consider a continuous space security game model with infinite-size action sets for players and present a novel deep learning based approach to extend the existing toolkit for solving security games. Specifically, we present (i) OptGradFP, a novel and general algorithm that searches for the optimal defender strategy in a parameterized continuous search space, and can also be used to learn policies over multiple game states simultaneously; (ii) OptGradFP-NN, a convolutional neural network based implementation of OptGradFP for continuous space security games. We demonstrate the potential to predict good defender strategies via experiments and analysis of OptGradFP and OptGradFP-NN on discrete and continuous game settings.

[1]  Sarit Kraus,et al.  When Security Games Hit Traffic: Optimal Traffic Enforcement Under One Sided Uncertainty , 2017, IJCAI.

[2]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[3]  Milind Tambe,et al.  Robust Protection of Fisheries with COmPASS , 2014, AAAI.

[4]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[5]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Juliane Hahn,et al.  Security And Game Theory Algorithms Deployed Systems Lessons Learned , 2016 .

[8]  Milind Tambe,et al.  "A Game of Thrones": When Human Behavior Models Compete in Repeated Stackelberg Security Games , 2015, AAMAS.

[9]  Branislav Bosanský,et al.  Using Correlated Strategies for Computing Stackelberg Equilibria in Extensive-Form Games , 2016, AAAI.

[10]  Rong Yang,et al.  Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[11]  Bo An,et al.  Game-Theoretic Resource Allocation for Protecting Large Public Events , 2014, AAAI.

[12]  Milind Tambe,et al.  Optimal patrol strategy for protecting moving targets with multiple mobile resources , 2013, AAMAS.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Sheng Zhong,et al.  On repeated stackelberg security game with the cooperative human behavior model for wildlife protection , 2018, Applied Intelligence.

[15]  Nicola Basilico,et al.  Coordinating Multiple Defensive Resources in Patrolling Games with Alarm Systems , 2017, AAMAS.

[16]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[17]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[18]  Michael P. Wellman,et al.  Gradient methods for stackelberg security games , 2016, UAI 2016.

[19]  Bo An,et al.  Security Games on a Plane , 2017, AAAI.

[20]  Milind Tambe,et al.  Patrol Strategies to Maximize Pristine Forest Area , 2012, AAAI.

[21]  Manish Jain,et al.  Computing optimal randomized resource allocations for massive security games , 2009, AAMAS 2009.

[22]  Vincent Conitzer,et al.  Solving Zero-Sum Security Games in Discretized Spatio-Temporal Domains , 2014, AAAI.

[23]  Vincent Conitzer,et al.  Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness , 2011, J. Artif. Intell. Res..

[24]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.