Double-oracle algorithm for computing an exact nash equilibrium in zero-sum extensive-form games

We investigate an iterative algorithm for computing an exact Nash equilibrium in two-player zero-sum extensive-form games with imperfect information. The approach uses the sequence-form representation of extensive-form games and the double-oracle algorithmic framework. The main idea is to restrict the game by allowing the players to play only some of the sequences of available actions, then iteratively solve this restricted game, and exploit fast best-response algorithms to add additional sequences to the restricted game for the next iteration. In this paper we (1) extend the sequence-form double-oracle method to be applicable on non-deterministic extensive-form games, (2) present more efficient methods for maintaining valid restricted game and computing best-response sequences, and finally we (3) provide theoretical guarantees of the convergence of the algorithm to a Nash equilibrium. We experimentally evaluate our algorithm on two types of games: a search game on a graph and simplified variants of Poker. The results show significant running-time improvements compared to the previous variant of the double-oracle algorithm, and demonstrate the ability to find an exact solution of much larger games compared to solving full linear program for the complete game.

[1]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[2]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[3]  Milind Tambe Security and Game Theory: EFFICIENT ALGORITHMS FOR MASSIVE SECURITY GAMES , 2011 .

[4]  Milind Tambe,et al.  Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[5]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[6]  Geoffrey J. Gordon,et al.  A Fast Bundle-based Anytime Algorithm for Poker and other Convex Games , 2007, AISTATS.

[7]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[8]  Branislav Bosanský,et al.  Iterative Algorithm for Solving Two-player Zero-sum Extensive-form Games with Imperfect Information , 2012, ECAI.

[9]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[11]  Vincent Conitzer,et al.  A double oracle algorithm for zero-sum security games on graphs , 2011, AAMAS.

[12]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[13]  Michael H. Bowling,et al.  A New Algorithm for Generating Equilibria in Massive Zero-Sum Games , 2007, AAAI.

[14]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[15]  Kevin Waugh,et al.  Strategy Grafting in Extensive Games , 2009, NIPS.

[16]  Kevin Waugh,et al.  Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[17]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[18]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[19]  Michael P. Wellman Trading Agents , 2011, Trading Agents.