Deep Reinforcement Learning for Green Security Game with Online Information

Motivated by the urgent need in green security domains such as protecting endangered wildlife from poaching and preventing illegal logging, researchers have proposed game theoretic models to optimize patrols conducted by law enforcement agencies. Despite the efforts, online information and online interactions (e.g., patrollers chasing the poachers by following their footprints) have been neglected in previous game models and solutions. Our research aims at providing a more practical solution for the complex real-world green security problems by empowering security games with deep reinforcement learning. Specifically, we propose a novel game model which incorporates the vital element of online information and provide a discussion of possible solutions as well as promising future research directions based on game theory and deep reinforcement learning. Introduction and Research Problem Game theory has become a well-established paradigm for addressing complex resource allocation and patrolling problems in security and sustainability domains. Models and algorithms have been proposed and studied extensively in the past decade, forming the area of “security game” (Tambe 2011). More recently, machine learning based models have been used to predict adversarial behaviors in green security domains such as wildlife poaching, and game-theoretic solutions built upon the learned behavioral models have been proposed (Xu et al. 2017; Gholami et al. 2017; Kar et al. 2017). Despite the efforts, a key element, online information, has been neglected in previous game models. For example, a well-trained ranger should be able to use the online information revealed by the traces left by the poacher (e.g., footprints, tree marks) to make flexible patrolling decisions rather than stick to the premeditated patrol routes. Thus there is no doubt that online information received during the interactions between the players plays an important role in the decision-making process and how to incorporate such online information into the solutions remains to be disclosed. However, incorporating online information into green security games leads to significant complexity, inevitably re∗The work was done while L. Yu interned at CMU. Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. sulting in games with sequential moves and imperfect information. This makes traditional mathematical programmingbased approaches for computing the equilibrium of the game intractable. On the other hand, reinforcement learning (RL) (Sutton and Barto 1998) algorithms are designed to exploit online information. RL employs a goal-oriented learning scheme where the agent learns to maximize its long-term cumulative reward by sequentially interacting with the environment. Recently, by employing the modeling power of deep learning, reinforcement learning has been successfully used on a wide variety of tasks, including playing the games of Atari (Mnih et al. 2015) and Go (Silver et al. 2016), robotic manipulation (Gu et al. 2016) and sequential data generation (Yu et al. 2017). Furthermore, researchers have generalized single-agent RL methods to the multi-agent systems where multiple agents coexist and interact with each other (Busoniu, Babuska, and De Schutter 2008). Thus in order to provide a more practical solution for the complex real-world security problems, in this paper we propose a novel game-theoretic model, which incorporates the vital online information that has been commonly neglected by literature and provide a discussion of potential algorithms that combine deep reinforcement learning and game theory to approximately compute equilibrium strategies in a complicated spatio-temporal setting with online interactions. In this paper, we illustrate our model and algorithm in the domain of protecting wildlife from poaching but note that the proposed solutions can be applied to other green security domains such as protecting the forest from illegal logging and protecting fisheries from overfishing.

[1]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[2]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[3]  Milind Tambe Security and Game Theory: EFFICIENT ALGORITHMS FOR MASSIVE SECURITY GAMES , 2011 .

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Bo An,et al.  Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security , 2016, AAAI.

[6]  Vincent Conitzer,et al.  A double oracle algorithm for zero-sum security games on graphs , 2011, AAMAS.

[7]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[8]  Milind Tambe,et al.  Optimal patrol strategy for protecting moving targets with multiple mobile resources , 2013, AAMAS.

[9]  Ilan Adler The equivalence of linear programs and zero-sum games , 2013, Int. J. Game Theory.

[10]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS 2008.

[11]  Y. Mansour,et al.  4 Learning , Regret minimization , and Equilibria , 2006 .

[12]  Vincent Conitzer,et al.  Security scheduling for real-world networks , 2013, AAMAS.

[13]  Bo An,et al.  GUARDS and PROTECT: next generation applications of security games , 2011, SECO.

[14]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Haifeng Xu,et al.  Optimal Patrol Planning for Green Security Games with Black-Box Attackers , 2017, GameSec.

[16]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[17]  Milind Tambe,et al.  Cloudy with a Chance of Poaching: Adversary Behavior Modeling and Forecasting with Real-World Poaching Data , 2017, AAMAS.

[18]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[19]  Milind Tambe,et al.  Taking It for a Test Drive: A Hybrid Spatio-Temporal Model for Wildlife Poaching Prediction Evaluated Through a Controlled Field Test , 2017, ECML/PKDD.

[20]  Sergey Levine,et al.  Deep Reinforcement Learning for Robotic Manipulation , 2016, ArXiv.

[21]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[22]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[23]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.