Stochastic Regret Minimization in Extensive-Form Games

Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full tree traversals. It works by using gradient estimates that can be computed via sampling. However, stochastic methods for sequential games have not been investigated extensively beyond MCCFR. In this paper we develop a new framework for developing stochastic regret minimization methods. This framework allows us to use any regret-minimization algorithm, coupled with any gradient estimator. The MCCFR algorithm can be analyzed as a special case of our framework, and this analysis leads to significantly-stronger theoretical on convergence, while simultaneously yielding a simplified proof. Our framework allows us to instantiate several new stochastic methods for solving sequential games. We show extensive experiments on three games, where some variants of our methods outperform MCCFR.

[1]  Tuomas Sandholm,et al.  Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning , 2017, ICML.

[2]  Michael H. Bowling,et al.  Counterfactual Regret Minimization in Sequential Security Games , 2016, AAAI.

[3]  Lasse Becker-Czarnetzki Report on DeepStack Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker , 2019 .

[4]  Branislav Bosanský,et al.  Sequence-Form Algorithm for Computing Stackelberg Equilibria in Extensive-Form Games , 2015, AAAI.

[5]  Branislav Bosanský,et al.  An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..

[6]  Tuomas Sandholm,et al.  Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead , 2017, AAAI.

[7]  Tuomas Sandholm,et al.  Regret Circuits: Composability of Regret Minimizers , 2018, ICML.

[8]  David A. Freedman,et al.  On the Amount of Variance Needed to Escape from a strip , 1973 .

[9]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[10]  Kazuoki Azuma WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .

[11]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[12]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[13]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[14]  Kevin Waugh,et al.  Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[15]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[16]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[17]  Jacob D. Abernethy,et al.  Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[18]  Tuomas Sandholm,et al.  Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks , 2019, NeurIPS.

[19]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[20]  Tuomas Sandholm,et al.  Steering Evolution Strategically: Computational Game Theory and Opponent Exploitation for Treatment Planning, Drug Design, and Synthetic Biology , 2015, AAAI.

[21]  Nicholas R. Jennings,et al.  Introducing Alarms in Adversarial Patrolling Games , 2013 .

[22]  Kevin Waugh,et al.  Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[23]  Michael H. Bowling,et al.  Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines , 2018, AAAI.

[24]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[25]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[26]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[27]  Michael H. Bowling,et al.  Tractable Objectives for Robust Policy Optimization , 2012, NIPS.

[28]  Duane Szafron,et al.  Generalized Sampling and Variance in Counterfactual Regret Minimization , 2012, AAAI.

[29]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[30]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[31]  Christopher Archibald,et al.  Modeling billiards games , 2009, AAMAS.

[32]  Tuomas Sandholm,et al.  Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.

[33]  Yoram Singer,et al.  A primal-dual perspective of online learning algorithms , 2007, Machine Learning.

[34]  Tuomas Sandholm,et al.  Power napping with loud neighbors: optimal energy-constrained jamming and anti-jamming , 2014, WiSec '14.

[35]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[36]  Francesco Orabona A Modern Introduction to Online Learning , 2019, ArXiv.

[37]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[38]  Tuomas Sandholm,et al.  Regret-Based Pruning in Extensive-Form Games , 2015, NIPS.

[39]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[40]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[41]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[42]  Kevin Waugh,et al.  Faster algorithms for extensive-form game solving via improved smoothing functions , 2018, Mathematical Programming.

[43]  Tuomas Sandholm,et al.  Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.

[44]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[45]  Zhu Han,et al.  Wireless Resource Scheduling in Virtualized Radio Access Networks Using Stochastic Learning , 2018, IEEE Transactions on Mobile Computing.