Theoretical and Practical Advances on Smoothing for Extensive-Form Games

Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that only a logarithmic dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of Ω(bdd), where b is the branching factor of the player, and d is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFRP, in practice.

[1]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[2]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[3]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[4]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[5]  Tuomas Sandholm,et al.  Lossless abstraction of imperfect information games , 2007, JACM.

[6]  Branislav Bosanský,et al.  An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..

[7]  Tuomas Sandholm,et al.  Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[8]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[9]  Michael L. Littman,et al.  Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[10]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[11]  A. Juditsky,et al.  5 First-Order Methods for Nonsmooth Convex Large-Scale Optimization , I : General Purpose Methods , 2010 .

[12]  Kevin Waugh,et al.  A Unified View of Large-Scale Zero-Sum Equilibrium Computation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.

[13]  Tuomas Sandholm,et al.  Extensive-form game abstraction with bounds , 2014, EC.

[14]  Kevin Waugh,et al.  DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker , 2017, ArXiv.

[15]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[16]  Tuomas Sandholm,et al.  Strategy-Based Warm Starting for Regret Minimization in Games , 2016, AAAI.

[17]  Duane Szafron,et al.  Automated Action Abstraction of Imperfect Information Extensive-Form Games , 2011, AAAI.

[18]  Kevin Waugh,et al.  Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[19]  Aranyak Mehta,et al.  Playing large games using simple strategies , 2003, EC '03.

[20]  Kevin Leyton-Brown,et al.  Polynomial-time computation of exact correlated equilibrium in compact games , 2010, EC '11.

[21]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[22]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[23]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[24]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[25]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[26]  Duane Szafron,et al.  Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games , 2012, AAAI.

[27]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[28]  Kevin Waugh,et al.  Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[29]  Tuomas Sandholm,et al.  Regret Transfer and Parameter Optimization , 2014, AAAI.

[30]  Tuomas Sandholm,et al.  Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.

[31]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[32]  Javier Peña,et al.  First-Order Algorithm with O(ln(1/e)) Convergence for e-Equilibrium in Two-Person Zero-Sum Games , 2008, AAAI.

[33]  Tuomas Sandholm,et al.  Imperfect-Recall Abstractions with Bounds in Games , 2014, EC.

[34]  Leon Hirsch,et al.  Fundamentals Of Convex Analysis , 2016 .

[35]  Tuomas Sandholm,et al.  Lossy stochastic game abstraction with bounds , 2012, EC '12.

[36]  Tuomas Sandholm,et al.  Abstraction for Solving Large Incomplete-Information Games , 2015, AAAI.

[37]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[38]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.