Solving Large Extensive-Form Games with Strategy Constraints

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.

[1]  Tuomas Sandholm,et al.  Game theory-based opponent modeling in large imperfect-information games , 2011, AAMAS.

[2]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[3]  Kevin Waugh,et al.  A Practical Use of Imperfect Recall , 2009, SARA.

[4]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[5]  Kevin Waugh,et al.  Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[6]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[7]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[8]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[9]  Bo An,et al.  An extended study on multi-objective security games , 2012, Autonomous Agents and Multi-Agent Systems.

[10]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[11]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[12]  E. Altman Constrained Markov Decision Processes , 1999 .

[13]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[14]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[15]  Richard D. Braatz,et al.  Piecewise Linear Dynamic Programming for Constrained POMDPs , 2008, AAAI.

[16]  A. Juditsky,et al.  Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[17]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[18]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[19]  Kevin Waugh,et al.  Abstraction pathologies in extensive games , 2009, AAMAS.

[20]  Michael H. Bowling,et al.  Online implicit agent modelling , 2013, AAMAS.

[21]  Tuomas Sandholm,et al.  Regret Minimization in Behaviorally-Constrained Zero-Sum Games , 2017, ICML.

[22]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[23]  Sylvie Thiébaux,et al.  RAO*: An Algorithm for Chance-Constrained POMDP's , 2016, AAAI.

[24]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[25]  Branislav Bosanský,et al.  Combining Compact Representation and Incremental Generation in Large Games with Sequential Strategies , 2015, AAAI.

[26]  Michael Johanson,et al.  Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[27]  Kevin Waugh,et al.  DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[28]  Kevin Waugh,et al.  Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[29]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[30]  Kevin Waugh,et al.  Theoretical and Practical Advances on Smoothing for Extensive-Form Games , 2017, EC.

[31]  Kevin Waugh,et al.  A Unified View of Large-Scale Zero-Sum Equilibrium Computation , 2014, AAAI Workshop: Computer Poker and Imperfect Information.

[32]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[33]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[34]  Alexandra M. Newman,et al.  Practical guidelines for solving difficult linear programs , 2013 .

[35]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[36]  Michael H. Bowling,et al.  Computing Robust Counter-Strategies , 2007, NIPS.

[37]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[38]  Neil Burch,et al.  Time and Space: Why Imperfect Information Games are Hard , 2018 .

[39]  Kee-Eung Kim,et al.  Approximate Linear Programming for Constrained Partially Observable Markov Decision Processes , 2015, AAAI.

[40]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[41]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.