Reduced Space and Faster Convergence in Imperfect-Information Games via Regret-Based Pruning

Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games. Regret-Based Pruning (RBP) is an improvement that allows poorly-performing actions to be temporarily pruned, thus speeding up CFR. We introduce Total RBP, a new form of RBP that reduces the space requirements of CFR as actions are pruned. We prove that in zero-sum games it asymptotically prunes any action that is not part of a best response to some Nash equilibrium. This leads to provably faster convergence and lower space requirements. Experiments show that Total RBP results in an order of magnitude reduction in space, and the reduction factor increases with game size.

[1]  Milan Hladík,et al.  Refining Subgames in Large Imperfect Information Games , 2016, AAAI.

[2]  Tuomas Sandholm,et al.  Lossless abstraction of imperfect information games , 2007, JACM.

[3]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[4]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[5]  Kevin Waugh,et al.  Faster First-Order Methods for Extensive-Form Game Solving , 2015, EC.

[6]  Kevin Waugh,et al.  Abstraction pathologies in extensive games , 2009, AAMAS.

[7]  Tuomas Sandholm,et al.  Finding equilibria in large sequential games of imperfect information , 2006, EC '06.

[8]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[9]  Tuomas Sandholm,et al.  Simultaneous Abstraction and Equilibrium Finding in Games , 2015, IJCAI.

[10]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[11]  Tuomas Sandholm,et al.  Regret-Based Pruning in Extensive-Form Games , 2015, NIPS.

[12]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[13]  François Pays,et al.  An Interior Point Approach to Large Games of Incomplete Information , 2014, AAAI 2014.

[14]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[15]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[16]  Tuomas Sandholm,et al.  Strategy-Based Warm Starting for Regret Minimization in Games , 2016, AAAI.

[17]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..