Deep Reinforcement Learning for Green Security Games with Real-Time Information

Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents’ subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games.

[1]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[2]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[3]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[4]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[5]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[6]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[7]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.

[8]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Sarit Kraus,et al.  Multi-robot perimeter patrol in adversarial settings , 2008, 2008 IEEE International Conference on Robotics and Automation.

[10]  Nicola Basilico,et al.  Leader-follower strategies for robotic patrolling in environments with arbitrary topologies , 2009, AAMAS.

[11]  Vincent Conitzer,et al.  Computing optimal strategies to commit to in extensive-form games , 2010, EC '10.

[12]  Vincent Conitzer,et al.  A double oracle algorithm for zero-sum security games on graphs , 2011, AAMAS.

[13]  Milind Tambe Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[14]  Bo An,et al.  GUARDS and PROTECT: next generation applications of security games , 2011, SECO.

[15]  Branislav Bosanský,et al.  Double-oracle algorithm for computing an exact nash equilibrium in zero-sum extensive-form games , 2013, AAMAS.

[16]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[17]  Vincent Conitzer,et al.  Security scheduling for real-world networks , 2013, AAMAS.

[18]  Ilan Adler The equivalence of linear programs and zero-sum games , 2013, Int. J. Game Theory.

[19]  Milind Tambe,et al.  Optimal patrol strategy for protecting moving targets with multiple mobile resources , 2013, AAMAS.

[20]  Bo An,et al.  Game-Theoretic Resource Allocation for Protecting Large Public Events , 2014, AAAI.

[21]  Milind Tambe,et al.  Defending Against Opportunistic Criminals: New Game-Theoretic Frameworks and Algorithms , 2014, GameSec.

[22]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[23]  Branislav Bosanský,et al.  Optimal Network Security Hardening Using Attack Graph Games , 2015, IJCAI.

[24]  Milind Tambe,et al.  When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2015, ICLR.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[28]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[29]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[30]  Alexander S. Poznyak,et al.  Adapting strategies to dynamic environments in controllable stackelberg security games , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[31]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[32]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[33]  Anjon Basak,et al.  Combining Graph Contraction and Strategy Generation for Green Security Games , 2016, GameSec.

[34]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[35]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[36]  Bo An,et al.  Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security , 2016, AAAI.

[37]  Milind Tambe,et al.  Cloudy with a Chance of Poaching: Adversary Behavior Modeling and Forecasting with Real-World Poaching Data , 2017, AAMAS.

[38]  Tuomas Sandholm,et al.  Libratus: The Superhuman AI for No-Limit Poker , 2017, IJCAI.

[39]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2017, AAAI.

[40]  Branislav Bosanský,et al.  An Algorithm for Constructing and Solving Imperfect Recall Abstractions of Large Extensive-Form Games , 2017, IJCAI.

[41]  Milind Tambe,et al.  Taking It for a Test Drive: A Hybrid Spatio-Temporal Model for Wildlife Poaching Prediction Evaluated Through a Controlled Field Test , 2017, ECML/PKDD.

[42]  Branislav Bosanský,et al.  Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games , 2017, AAAI.

[43]  Tuomas Sandholm,et al.  Smoothing Method for Approximate Extensive-Form Perfect Equilibrium , 2017, IJCAI.

[44]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[45]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[46]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[47]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[48]  Haifeng Xu,et al.  Optimal Patrol Planning for Green Security Games with Black-Box Attackers , 2017, GameSec.

[49]  Sarit Kraus,et al.  When Security Games Hit Traffic: Optimal Traffic Enforcement Under One Sided Uncertainty , 2017, IJCAI.

[50]  Yan Liu,et al.  Policy Learning for Continuous Space Security Games Using Neural Networks , 2018, AAAI.