Composability of Regret Minimizers

Regret minimization is a powerful tool for solving large-scale problems; it was recently used in breakthrough results for large-scale extensive-form-game solving. This was achieved by composing simplex regret minimizers into an overall regret-minimization framework for extensive-form-game strategy spaces. In this paper we study the general composability of regret minimizers. We derive a calculus for constructing regret minimizers for complex convex sets that are constructed from convexity-preserving operations on simpler convex sets. In particular, we show that local regret minimizers for the simpler sets can be composed with additional regret minimizers into an aggregate regret minimizer for the complex set. As an application of our framework we show that the CFR framework can be constructed easily from our framework. We also show how to construct a CFR variant for extensive-form games with strategy constraints. Unlike a recently proposed variant of CFR for strategy constraints by Davis, Waugh, and Bowling (2018), the algorithm resulting from our calculus does not depend on any unknown constants and thus avoids binary search.

[1]  Peter Bro Miltersen,et al.  Computing a quasi-perfect equilibrium of a two-player game , 2010 .

[2]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[3]  Tuomas Sandholm,et al.  Smoothing Method for Approximate Extensive-Form Perfect Equilibrium , 2017, IJCAI.

[4]  Milan Hladík,et al.  Refining Subgames in Large Imperfect Information Games , 2016, AAAI.

[5]  Tuomas Sandholm,et al.  Simultaneous Abstraction and Equilibrium Finding in Games , 2015, IJCAI.

[6]  Tuomas Sandholm,et al.  Regret-Based Pruning in Extensive-Form Games , 2015, NIPS.

[7]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[8]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[9]  Tuomas Sandholm,et al.  Practical exact algorithm for trembling-hand equilibrium refinements in games , 2018, NeurIPS.

[10]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[11]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[12]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[13]  Tuomas Sandholm,et al.  Strategy-Based Warm Starting for Regret Minimization in Games , 2016, AAAI.

[14]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[15]  Tuomas Sandholm,et al.  Regret Transfer and Parameter Optimization , 2014, AAAI.

[16]  Kevin Waugh,et al.  Solving Large Extensive-Form Games with Strategy Constraints , 2018, AAAI.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[19]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[20]  Kevin Waugh,et al.  Faster algorithms for extensive-form game solving via improved smoothing functions , 2018, Mathematical Programming.

[21]  Tuomas Sandholm,et al.  Dynamic Thresholding and Pruning for Regret Minimization , 2017, AAAI.

[22]  Nicola Gatti,et al.  Extensive-Form Perfect Equilibrium Computation in Two-Player Games , 2017, AAAI.

[23]  Kevin Waugh,et al.  Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[24]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[25]  J. Zico Kolter,et al.  What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.

[26]  Tuomas Sandholm,et al.  Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning , 2017, ICML.

[27]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[28]  Tuomas Sandholm,et al.  Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games , 2018, AAAI.

[29]  Michael H. Bowling,et al.  Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[30]  Tuomas Sandholm,et al.  Regret Minimization in Behaviorally-Constrained Zero-Sum Games , 2017, ICML.

[31]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[32]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[33]  Tuomas Sandholm,et al.  Endgame Solving in Large Imperfect-Information Games , 2015, AAAI Workshop: Computer Poker and Imperfect Information.