论文信息 - Deep Reinforcement Learning for Green Security Game with Online Information

Deep Reinforcement Learning for Green Security Game with Online Information

Motivated by the urgent need in green security domains such as protecting endangered wildlife from poaching and preventing illegal logging, researchers have proposed game theoretic models to optimize patrols conducted by law enforcement agencies. Despite the efforts, online information and online interactions (e.g., patrollers chasing the poachers by following their footprints) have been neglected in previous game models and solutions. Our research aims at providing a more practical solution for the complex real-world green security problems by empowering security games with deep reinforcement learning. Specifically, we propose a novel game model which incorporates the vital element of online information and provide a discussion of possible solutions as well as promising future research directions based on game theory and deep reinforcement learning. Introduction and Research Problem Game theory has become a well-established paradigm for addressing complex resource allocation and patrolling problems in security and sustainability domains. Models and algorithms have been proposed and studied extensively in the past decade, forming the area of “security game” (Tambe 2011). More recently, machine learning based models have been used to predict adversarial behaviors in green security domains such as wildlife poaching, and game-theoretic solutions built upon the learned behavioral models have been proposed (Xu et al. 2017; Gholami et al. 2017; Kar et al. 2017). Despite the efforts, a key element, online information, has been neglected in previous game models. For example, a well-trained ranger should be able to use the online information revealed by the traces left by the poacher (e.g., footprints, tree marks) to make flexible patrolling decisions rather than stick to the premeditated patrol routes. Thus there is no doubt that online information received during the interactions between the players plays an important role in the decision-making process and how to incorporate such online information into the solutions remains to be disclosed. However, incorporating online information into green security games leads to significant complexity, inevitably re∗The work was done while L. Yu interned at CMU. Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. sulting in games with sequential moves and imperfect information. This makes traditional mathematical programmingbased approaches for computing the equilibrium of the game intractable. On the other hand, reinforcement learning (RL) (Sutton and Barto 1998) algorithms are designed to exploit online information. RL employs a goal-oriented learning scheme where the agent learns to maximize its long-term cumulative reward by sequentially interacting with the environment. Recently, by employing the modeling power of deep learning, reinforcement learning has been successfully used on a wide variety of tasks, including playing the games of Atari (Mnih et al. 2015) and Go (Silver et al. 2016), robotic manipulation (Gu et al. 2016) and sequential data generation (Yu et al. 2017). Furthermore, researchers have generalized single-agent RL methods to the multi-agent systems where multiple agents coexist and interact with each other (Busoniu, Babuska, and De Schutter 2008). Thus in order to provide a more practical solution for the complex real-world security problems, in this paper we propose a novel game-theoretic model, which incorporates the vital online information that has been commonly neglected by literature and provide a discussion of potential algorithms that combine deep reinforcement learning and game theory to approximately compute equilibrium strategies in a complicated spatio-temporal setting with online interactions. In this paper, we illustrate our model and algorithm in the domain of protecting wildlife from poaching but note that the proposed solutions can be applied to other green security domains such as protecting the forest from illegal logging and protecting fisheries from overfishing.