论文信息 - Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions - 字舞流文

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

We study the performance of optimistic regret-minimization algorithms for both minimizing regret in, and computing Nash equilibria of, zero-sum extensive-form games. In order to apply these algorithms to extensive-form games, a distance-generating function is needed. We study the use of the dilated entropy and dilated Euclidean distance functions. For the dilated Euclidean distance function we prove the first explicit bounds on the strong-convexity parameter for general treeplexes. Furthermore, we show that the use of dilated distance-generating functions enable us to decompose the mirror descent algorithm, and its optimistic variant, into local mirror descent algorithms at each information set. This decomposition mirrors the structure of the counterfactual regret minimization framework, and enables important techniques in practice, such as distributed updates and pruning of cold parts of the game tree. Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms. We experimentally compare to the popular algorithm CFR+, which has a theoretical convergence rate of $T^{-0.5}$ in theory, but is known to often converge at a rate of $T^{-1}$, or better, in practice. We give an example matrix game where CFR+ experimentally converges at a relatively slow rate of $T^{-0.74}$, whereas our optimistic methods converge faster than $T^{-1}$. We go on to show that our fast rate also holds in the Kuhn poker game, which is an extensive-form game. For games with deeper game trees however, we find that CFR+ is still faster. Finally we show that when the goal is minimizing regret, rather than computing a Nash equilibrium, our optimistic methods can outperform CFR+, even in deep game trees.

Tuomas Sandholm | Christian Kroer | Gabriele Farina | T. Sandholm | Christian Kroer | Gabriele Farina

[1] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[2] Tuomas Sandholm,et al. Lossless abstraction of imperfect information games , 2007, JACM.

[3] Tuomas Sandholm,et al. Endgame Solving in Large Imperfect-Information Games , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[4] Tuomas Sandholm,et al. Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning , 2017, ICML.

[5] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[6] Tuomas Sandholm,et al. Stable-Predictive Optimistic Counterfactual Regret Minimization , 2019, ICML.

[7] Antonin Chambolle,et al. On the ergodic convergence rates of a first-order primal–dual algorithm , 2016, Math. Program..

[8] Yoram Singer,et al. A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[9] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[10] Donald Goldfarb,et al. Increasing Iterate Averaging for Solving Saddle-Point Problems , 2019, AAAI.

[11] Michael H. Bowling,et al. No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[12] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[13] Christian Kroer,et al. First-Order Methods with Increasing Iterate Averaging for Solving Saddle-Point Problems , 2019, ArXiv.

[14] Javier Peña,et al. Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[15] Tuomas Sandholm,et al. Imperfect-Recall Abstractions with Bounds in Games , 2014, EC.

[16] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[17] Tuomas Sandholm,et al. Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[18] Rong Jin,et al. 25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[19] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[20] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[21] Tuomas Sandholm,et al. Depth-Limited Solving for Imperfect-Information Games , 2018, NeurIPS.

[22] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[23] Tuomas Sandholm,et al. Solving Large Sequential Games with the Excessive Gap Technique , 2018, NeurIPS.

[24] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[25] Kevin Waugh,et al. Faster algorithms for extensive-form game solving via improved smoothing functions , 2018, Mathematical Programming.

[26] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.

[27] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[28] Kevin Waugh,et al. Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[29] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[30] Michael H. Bowling,et al. Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[31] Michael H. Bowling,et al. Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[32] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .

[33] Milan Hladík,et al. Refining Subgames in Large Imperfect Information Games , 2016, AAAI.

[34] Tuomas Sandholm,et al. Extensive-form game abstraction with bounds , 2014, EC.

[35] Tuomas Sandholm,et al. Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[36] Tuomas Sandholm,et al. Potential-Aware Imperfect-Recall Abstraction with Earth Mover's Distance in Imperfect-Information Games , 2014, AAAI.

[37] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.

[38] Antonin Chambolle,et al. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.