论文信息 - Regret Minimization in Behaviorally-Constrained Zero-Sum Games - 字舞流文

Regret Minimization in Behaviorally-Constrained Zero-Sum Games

No-regret learning has emerged as a powerful tool for solving extensive-form games. This was facilitated by the counterfactual-regret minimization (CFR) framework, which relies on the instantiation of regret minimizers for simplexes at each information set of the game. We use an instantiation of the CFR framework to develop algorithms for solving behaviorally-constrained (and, as a special case, perturbed in the Selten sense) extensive-form games, which allows us to compute approximate Nash equilibrium refinements. Nash equilibrium refinements are motivated by a major deficiency in Nash equilibrium: it provides virtually no guarantees on how it will play in parts of the game tree that are reached with zero probability. Refinements can mend this issue, but have not been adopted in practice, mostly due to a lack of scalable algorithms. We show that, compared to standard algorithms, our method finds solutions that have substantially better refinement properties, while enjoying a convergence rate that is comparable to that of state-of-the-art algorithms for Nash equilibrium computation both in theory and practice.

Tuomas Sandholm | Christian Kroer | Gabriele Farina | T. Sandholm | Christian Kroer | Gabriele Farina

[1] Peter Bro Miltersen,et al. Computing a quasi-perfect equilibrium of a two-player game , 2010 .

[2] R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games , 1975, Classics in Game Theory.

[3] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[4] John Hillas. ON THE RELATION BETWEEN PERFECT EQUILIBRIA IN EXTENSIVE FORM GAMES AND PROPER EQUILIBRIA IN NORMAL FORM GAMES , 1996 .

[5] Tuomas Sandholm,et al. Regret-Based Pruning in Extensive-Form Games , 2015, NIPS.

[6] Tuomas Sandholm,et al. Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[7] Javier Peña,et al. Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[8] Peter L. Bartlett,et al. Blackwell Approachability and No-Regret Learning are Equivalent , 2010, COLT.

[9] Yurii Nesterov,et al. Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[10] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[11] Tuomas Sandholm,et al. Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning , 2017, ICML.

[12] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[13] Tuomas Sandholm,et al. Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[14] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[15] Tuomas Sandholm,et al. Imperfect-Recall Abstractions with Bounds in Games , 2014, EC.

[16] Nicola Gatti,et al. Extensive-Form Perfect Equilibrium Computation in Two-Player Games , 2017, AAAI.

[17] Kevin Waugh,et al. Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[18] Matus Telgarsky. Blackwell Approachability and Minimax Theory , 2011, ArXiv.

[19] Tuomas Sandholm,et al. The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[20] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[21] Michael H. Bowling,et al. Data Biased Robust Counter Strategies , 2009, AISTATS.

[22] Michael H. Bowling,et al. Computing Robust Counter-Strategies , 2007, NIPS.

[23] Michael H. Bowling,et al. Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[24] Geoffrey J. Gordon. No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[25] Tuomas Sandholm,et al. Smoothing Method for Approximate Extensive-Form Perfect Equilibrium , 2017, IJCAI.

[26] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[27] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .

[28] Tuomas Sandholm,et al. Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.

[29] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[30] Kevin Waugh,et al. Theoretical and Practical Advances on Smoothing for Extensive-Form Games , 2017, EC.

[31] Todd W. Neller,et al. An Introduction to Counterfactual Regret Minimization , 2013 .

[32] Geoffrey J. Gordon,et al. No-regret learning in convex games , 2008, ICML '08.

[33] Michael H. Bowling,et al. No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[34] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[35] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.