Safe Search for Stackelberg Equilibria in Extensive-Form Games

Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower. In recent years, it has become a cornerstone of many security applications, including airport patrolling and wildlife poaching prevention. Even though many of these settings are sequential in nature, existing techniques pre-compute the entire solution ahead of time. In this paper, we present a theoretically sound and empirically effective way to apply search, which leverages extra online computation to improve a solution, to the computation of Stackelberg equilibria in general-sum games. Instead of the leader attempting to solve the full game upfront, an approximate “blueprint” solution is first computed offline and is then improved online for the particular subgames encountered in actual play. We prove that our search technique is guaranteed to perform no worse than the pre-computed blueprint strategy, and empirically demonstrate that it enables approximately solving significantly larger games compared to purely offline methods. We also show that our search operation may be cast as a smaller Stackelberg problem, making our method complementary to existing algorithms based on strategy generation.

[1]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[2]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.

[3]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[4]  Milan Hladík,et al.  Refining Subgames in Large Imperfect Information Games , 2016, AAAI.

[5]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[6]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[7]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[8]  Branislav Bosanský,et al.  Sequence-Form Algorithm for Computing Stackelberg Equilibria in Extensive-Form Games , 2015, AAAI.

[9]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[10]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[11]  Michael H. Bowling,et al.  Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[12]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[13]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[14]  Bo An,et al.  PAWS - A Deployed Game-Theoretic Application to Combat Poaching , 2017, AI Mag..

[15]  Tuomas Sandholm,et al.  Solving Large Sequential Games with the Excessive Gap Technique , 2018, NeurIPS.

[16]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[17]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[18]  Jakub Cerný,et al.  Incremental Strategy Generation for Stackelberg Equilibria in Extensive-Form Games , 2018, EC.

[19]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[20]  Shen Lin Computer solutions of the traveling salesman problem , 1965 .

[21]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[22]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[23]  Manish Jain,et al.  Quality-bounded solutions for finite Bayesian Stackelberg games: scaling up , 2011, AAMAS.

[24]  Vincent Conitzer,et al.  Computing optimal strategies to commit to in extensive-form games , 2010, EC '10.