Combining Incremental Strategy Generation and Branch and Bound Search for Computing Maxmin Strategies in Imperfect Recall Games

Extensive-form games with imperfect recall are an important model of dynamic games where the players are allowed to forget previously known information. Often, imperfect recall games are the result of an abstraction algorithm that simplifies a large game with perfect recall. Unfortunately, solving an imperfect recall game has fundamental problems since a Nash equilibrium does not have to exist. Alternatively, we can seek maxmin strategies that guarantee an expected outcome. The only existing algorithm computing maxmin strategies in two-player imperfect recall games without absentmindedness, however, requires approximating a bilinear mathematical program that is proportional to the size of the whole game and thus has a limited scalability. We propose a novel algorithm for computing maxmin strategies in this class of games that combines this approximate algorithm with an incremental strategy-generation technique designed previously for extensive-form games with perfect recall. Experimental evaluation shows that the novel algorithm builds only a fraction of the game tree and improves the scalability by several orders of magnitude. Finally, we demonstrate that our algorithm can solve an abstracted variant of a large game faster compared to the algorithms operating on the unabstracted perfect-recall variant.

[1]  Michael H. Bowling,et al.  Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .

[2]  Pedro M. Castro,et al.  Global optimization of bilinear programs with a multiparametric disaggregation technique , 2013, J. Glob. Optim..

[3]  Tuomas Sandholm,et al.  Lossless abstraction of imperfect information games , 2007, JACM.

[4]  D. Koller,et al.  The complexity of two-person zero-sum games in extensive form , 1992 .

[5]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[6]  J. Jude Kline,et al.  Minimum Memory for Equivalence between Ex Ante Optimality and Time-Consistency , 2002, Games Econ. Behav..

[7]  Philipp C. Wichardt Existence of Nash equilibria in finite extensive form games with imperfect recall: A counterexample , 2008, Games Econ. Behav..

[8]  Neil Burch,et al.  Heads-up limit hold’em poker is solved , 2015, Science.

[9]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[10]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[11]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[12]  Michael H. Bowling,et al.  Counterfactual Regret Minimization in Sequential Security Games , 2016, AAAI.

[13]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[14]  Branislav Bosanský,et al.  Computing Maxmin Strategies in Extensive-form Zero-sum Games with Imperfect Recall , 2017, ICAART.

[15]  Tuomas Sandholm,et al.  Imperfect-Recall Abstractions with Bounds in Games , 2014, EC.

[16]  Branislav Bosanský,et al.  An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..

[17]  B. Stengel,et al.  Efficient Computation of Behavior Strategies , 1996 .

[18]  Geoffrey J. Gordon No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[19]  Daphne Koller,et al.  Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.

[20]  Tuomas Sandholm,et al.  Extensive-form game abstraction with bounds , 2014, EC.

[21]  M. Kaneko,et al.  Behavior strategies, mixed strategies and perfect recall , 1995 .

[22]  Branislav Bosanský,et al.  Towards Solving Imperfect Recall Games , 2017, AAAI Workshops.