论文信息 - Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization

Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization

Using the bluffing dice game Dudo as a challenge domain, we abstract information sets by an imperfect recall of actions. Even with such abstraction, the standard Counterfactual Regret Minimization (CFR) algorithm proves impractical for Dudo, since the number of recursive visits to the same abstracted information sets increase exponentially with the depth of the game graph. By holding strategies fixed across each training iteration, we show how CFR training iterations may be transformed from an exponential-time recursive algorithm into a polynomial-time dynamic-programming algorithm, making computation of an approximate Nash equilibrium for the full 2-player game of Dudo possible for the first time.

Todd W. Neller | Steven Hnath

[1] Reiner Knizia. Dice Games Properly Explained , 2010 .

[2] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[3] Duane Szafron,et al. Using counterfactual regret minimization to create competitive multiplayer poker agents , 2010, AAMAS.

[4] Kevin Waugh,et al. A Practical Use of Imperfect Recall , 2009, SARA.

[5] Bernhard von Stengel,et al. Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.

[6] Kevin Waugh,et al. Abstraction pathologies in extensive games , 2009, AAMAS.

[7] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[8] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.