论文信息 - Fictitious Self-Play in Extensive-Form Games

Fictitious Self-Play in Extensive-Form Games

Fictitious play is a popular game-theoretic model of learning in games. However, it has received little attention in practical applications to large problems. This paper introduces two variants of fictitious play that are implemented in behavioural strategies of an extensive-form game. The first variant is a full-width process that is realization equivalent to its normal-form counterpart and therefore inherits its convergence guarantees. However, its computational requirements are linear in time and space rather than exponential. The second variant, Fictitious Self-Play, is a machine learning framework that implements fictitious play in a sample-based fashion. Experiments in imperfect-information poker games compare our approaches and demonstrate their convergence to approximate Nash equilibria.

[1] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[2] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[3] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[4] H. W. Kuhn,et al. 11. Extensive Games and the Problem of Information , 1953 .

[5] Samuel Karlin,et al. Mathematical Methods and Theory in Games, Programming, and Economics , 1961 .

[6] Roger B. Myerson,et al. Game theory - Analysis of Conflict , 1991 .

[7] Bernhard von Stengel,et al. Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.

[8] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[9] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .

[10] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[11] H. J. Jacobsen,et al. Fictitious Play in Extensive Form Games , 1996 .

[12] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[13] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[14] William H. Sandholm,et al. ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[15] Josef Hofbauer,et al. Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..

[16] Robert L. Smith,et al. A Fictitious Play Approach to Large-Scale Optimization , 2005, Oper. Res..

[17] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[18] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[19] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..

[20] Javier Peña,et al. Gradient-Based Algorithms for Finding Nash Equilibria in Extensive Form Games , 2007, WINE.

[21] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[22] Geoffrey J. Gordon,et al. A Fast Bundle-based Anytime Algorithm for Poker and other Convex Games , 2007, AISTATS.

[23] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[24] Tuomas Sandholm,et al. Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information , 2009, IJCAI.

[25] Tuomas Sandholm,et al. The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[26] Peter Bro Miltersen,et al. Computing a quasi-perfect equilibrium of a two-player game , 2010 .

[27] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[28] M. Littman,et al. Solving for Best Responses in Extensive-Form Games using Reinforcement Learning Methods , 2013 .

[29] Branislav Bosanský,et al. An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..

[30] Constantinos Daskalakis,et al. A Counter-example to Karlin's Strong Conjecture for Fictitious Play , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[31] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[32] David Silver,et al. Smooth UCT Search in Computer Poker , 2015, IJCAI.