Playing repeated Stackelberg games with unknown opponents

In Stackelberg games, a "leader" player first chooses a mixed strategy to commit to, then a "follower" player responds based on the observed leader strategy. Notable strides have been made in scaling up the algorithms for such games, but the problem of finding optimal leader strategies spanning multiple rounds of the game, with a Bayesian prior over unknown follower preferences, has been left unaddressed. Towards remedying this shortcoming we propose a first-of-a-kind tractable method to compute an optimal plan of leader actions in a repeated game against an unknown follower, assuming that the follower plays myopic best-response in every round. Our approach combines Monte Carlo Tree Search, dealing with leader exploration/exploitation tradeoffs, with a novel technique for the identification and pruning of dominated leader strategies. The method provably finds asymptotically optimal solutions and scales up to real world security games spanning double-digit number of rounds.

[1]  Drew Fudenberg,et al.  Game theory (3. pr.) , 1991 .

[2]  T. Basar,et al.  A game theoretic approach to decision and analysis in network intrusion detection , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[5]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[6]  Sarit Kraus,et al.  An efficient heuristic approach for security against multiple adversaries , 2007, AAMAS '07.

[7]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[8]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[9]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS 2008.

[10]  Tansu Alpcan,et al.  Security Games with Incomplete Information , 2009, 2009 IEEE International Conference on Communications.

[11]  Milind Tambe,et al.  Effective solutions for real-world Stackelberg games: when agents must deal with human uncertainties , 2009, AAMAS 2009.

[12]  Vincent Conitzer,et al.  Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[13]  Bart Selman,et al.  Understanding Sampling-based Adversarial Search Methods , 2010, UAI 2010.

[14]  Bart Selman,et al.  Understanding Sampling Style Adversarial Search Methods , 2010, UAI.

[15]  Manish Jain,et al.  Security Games with Arbitrary Schedules: A Branch and Price Approach , 2010, AAAI.

[16]  Richard B. Segal,et al.  On the Scalability of Parallel UCT , 2010, Computers and Games.

[17]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[18]  Milind Tambe,et al.  Approximation methods for infinite Bayesian Stackelberg games: modeling distributional payoff uncertainty , 2011, AAMAS.