Online Learning and Planning in Resource Conservation Games

Protecting our environment and natural resources is a major global challenge. “Protectors” (law enforcement agencies) try to protect these natural resources, while “extractors” (criminals) seek to exploit them. In many domains, such as illegal fishing, the extractors know more about the distribution and richness of the resources than the protectors, making it extremely difficult for the protectors to optimally allocate their assets for patrol and interdiction. Fortunately, extractors carry out frequent illegal extractions, so protectors can learn the richness of resources by observing the extractor’s behavior. This paper presents an approach for allocating protector assets based on learning from extractors. We make the following four specific contributions: (i) we model resource conservation as a repeated game and transform this repeated game into a POMDP, which cannot be solved by the latest general POMDP solvers due to its exponential state space; (ii) in response, we propose GMOP, a dedicated algorithm that combines Gibbs sampling with Monte Carlo tree search for online planning in this POMDP; (iii) for a specific class of our game, we speed up the GMOP algorithm without sacrificing solution quality, as well as provide a heuristic that trades off solution quality for lower computational cost; (iv) we explore the continuous utility scenario where the POMDP becomes a continuous-state POMDP, and provide a solution in special cases.

[1]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[2]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[3]  Trey Smith,et al.  Probabilistic planning for robotic exploration , 2007 .

[4]  Vincent Conitzer,et al.  Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[5]  Milind Tambe,et al.  Online planning for optimal protector strategies in resource conservation games , 2014, AAMAS.

[6]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[7]  Milind Tambe,et al.  A unified method for handling discrete and continuous uncertainty in Bayesian Stackelberg games , 2012, AAMAS.

[8]  Robert J. Aumann,et al.  Repeated Games with Incomplete Information , 1995 .

[9]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[10]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Milind Tambe Security and Game Theory: EFFICIENT ALGORITHMS FOR MASSIVE SECURITY GAMES , 2011 .

[13]  Milind Tambe,et al.  Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[14]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[15]  Vladik Kreinovich,et al.  Security games with interval uncertainty , 2013, AAMAS.

[16]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[17]  Gerald Tesauro,et al.  Playing repeated Stackelberg games with unknown opponents , 2012, AAMAS.