Restless Poachers: Handling Exploration-Exploitation Tradeoffs in Security Domains

The success of Stackelberg Security Games (SSGs) in counter-terrorism domains has inspired researchers' interest in applying game-theoretic models to other security domains with frequent interactions between defenders and attackers, e.g., wildlife protection. Previous research optimizes defenders' strategies by modeling this problem as a repeated Stackelberg game, capturing the special property in this domain --- frequent interactions between defenders and attackers. However, this research fails to handle exploration-exploitation tradeoff in this domain caused by the fact that defenders only have knowledge of attack activities at targets they protect. This paper addresses this shortcoming and provides the following contributions: (i) We formulate the problem as a restless multi-armed bandit (RMAB) model to address this challenge. (ii) To use Whittle index policy to plan for patrol strategies in the RMAB, we provide two sufficient conditions for indexability and an algorithm to numerically evaluate indexability. (iii) Given indexability, we propose a binary search based algorithm to find Whittle index policy efficiently.

[1]  Viliam Lisý,et al.  Combining Online Learning and Equilibrium Computation in Security Games , 2015, GameSec.

[2]  Milind Tambe,et al.  Keeping Pace with Criminals: Designing Patrol Allocation Against Adaptive Opportunistic Criminals , 2015, AAMAS.

[3]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[4]  K. Glazebrook,et al.  Some indexable families of restless bandit problems , 2006, Advances in Applied Probability.

[5]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[6]  Milind Tambe,et al.  TRUSTS: Scheduling Randomized Patrols for Fare Inspection in Transit Systems , 2012, IAAI.

[7]  Kevin D. Glazebrook,et al.  Whittle's index policy for a multi-class queueing system with convex holding costs , 2003, Math. Methods Oper. Res..

[8]  Viliam Lisý,et al.  Online Learning Methods for Border Patrol Resource Allocation , 2014, GameSec.

[9]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[10]  Rong Yang,et al.  Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[11]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[12]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[13]  E. Feron,et al.  Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  R. Weber,et al.  On an index policy for restless bandits , 1990, Journal of Applied Probability.

[16]  Juliane Hahn,et al.  Security And Game Theory Algorithms Deployed Systems Lessons Learned , 2016 .

[17]  Milind Tambe,et al.  Online planning for optimal protector strategies in resource conservation games , 2014, AAMAS.

[18]  Bhaskar Krishnamachari,et al.  Dynamic Multichannel Access With Imperfect Channel State Detection , 2010, IEEE Transactions on Signal Processing.

[19]  Milind Tambe,et al.  When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[20]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[21]  Manish Jain,et al.  Risk-Averse Strategies for Security Games with Execution and Observational Uncertainty , 2011, AAAI.

[22]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.