Reinforcement Learning for Unified Allocation and Patrolling in Signaling Games with Uncertainty

Green Security Games (GSGs) have been successfully used in the protection of valuable resources such as fisheries, forests, and wildlife. Real-world deployment involves both resource allocation and subsequent coordinated patrolling with communication in the presence real-time, uncertain information. Previous game models do not address both of these stages simultaneously. Furthermore, adopting existing solution strategies is difficult since they do not scale well for larger, more complex variants of the game models. We propose a novel GSG model to address these challenges. We also present a novel algorithm, CombSGPO, to compute a defender strategy for this game model. CombSGPO performs policy search over a multidimensional, discrete action space to compute an allocation strategy that is best suited to a best-response patrolling strategy for the defender, learnt by training a multi-agent Deep QNetwork. We show via experiments that CombSGPO converges to better strategies and is more scalable than comparable approaches. From a detailed analysis of the coordination and signaling behavior learnt by CombSGPO, we find that strategic signaling emerges in the final learnt strategy.

[1]  Alexander Liniger,et al.  Competitive Policy Optimization , 2020, UAI.

[2]  Milind Tambe,et al.  To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and Sustainability , 2020, AAAI.

[3]  Milind Tambe,et al.  DeepFP for Finding Nash Equilibrium in Continuous Action Spaces , 2019, GameSec.

[4]  Philip S. Thomas,et al.  Learning Action Representations for Reinforcement Learning , 2019, ICML.

[5]  Ramakant Nevatia,et al.  SPOT Poachers in Action: Augmenting Conservation Drones With Automatic Detection in Near Real Time , 2018, AAAI.

[6]  Yan Liu,et al.  Policy Learning for Continuous Space Security Games Using Neural Networks , 2018, AAAI.

[7]  Frans A. Oliehoek,et al.  Model-Based Reinforcement Learning under Periodical Observability , 2018, AAAI Spring Symposia.

[8]  Haifeng Xu,et al.  Strategic Coordination of Human Patrollers and Mobile Sensors With Signaling for Security Games , 2018, AAAI.

[9]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[10]  Haifeng Xu,et al.  Optimal Patrol Planning for Green Security Games with Black-Box Attackers , 2017, GameSec.

[11]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[12]  Anjon Basak,et al.  Combining Graph Contraction and Strategy Generation for Green Security Games , 2016, GameSec.

[13]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[14]  Bo An,et al.  Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security , 2016, AAAI.

[15]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[16]  Nicola Basilico,et al.  A Security Game Model for Environment Protection in the Presence of an Alarm System , 2015, GameSec.

[17]  Milind Tambe,et al.  When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[18]  Milind Tambe,et al.  Unleashing Dec-MDPs in Security Games: Enabling Effective Defender Teamwork , 2014, ECAI.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Milind Tambe,et al.  TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems , 2012, IAAI.

[21]  Milind Tambe,et al.  Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[22]  Vincent Conitzer,et al.  Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness , 2011, J. Artif. Intell. Res..

[23]  Milind Tambe,et al.  Security and Game Theory: IRIS – A Tool for Strategic Security Allocation in Transportation Networks , 2011, AAMAS 2011.

[24]  Sarit Kraus,et al.  ARMOR Software: A Game-Theoretic Approach to Airport Security , 2009 .

[25]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.

[26]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[27]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[28]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.