论文信息 - A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games - 字舞流文

A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

Among existing algorithms for solving imperfect-information extensive-form games, Monte Carlo Counterfactual Regret Minimization (MCCFR) and its variants are the most popular ones. However, MCCFR suffers from slow convergence due to its high variance in estimating values. In this paper, we introduce Semi-OS, a fast-convergence method developed from Outcome-Sampling MCCF R (OS), the most popular variant of MCCFR. Semi-OS makes two novel modifications to OS. First, Semi-OS stores all histories and their values at each information set. Second, after each time we update the strategy, Semi-OS requires a full game-tree traversal to update these values. These two modifications yield a better estimation of regrets. We show that, by selecting an appropriate discount rate, Semi-OS not only significantly speeds up the convergence rate in Leduc Poker but also statistically outperforms OS in head-to-head matches of Leduc Poker, a common testbed of imperfect information games, involving 200,000 hands.

Li Xia | Jun Yang | Xiaoyan Hu | Qianchuan Zhao | Qianchuan Zhao | Li Xia | Jun Yang | Xiaoyang Hu

[1] Kevin Waugh,et al. Abstraction in Large Extensive Games , 2009 .

[2] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[3] Tuomas Sandholm,et al. The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[4] Michael H. Bowling,et al. Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines , 2018, AAAI.

[5] Michael H. Bowling,et al. Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games , 2015, AAMAS.

[6] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[7] Tuomas Sandholm,et al. Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[8] Michael L. Littman,et al. Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[9] J. M. Bilbao,et al. Contributions to the Theory of Games , 2005 .

[10] Duane Szafron,et al. Generalized Sampling and Variance in Counterfactual Regret Minimization , 2012, AAAI.

[11] Tuomas Sandholm,et al. Algorithms for abstracting and solving imperfect information games , 2009 .

[12] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[13] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[14] Michael H. Bowling,et al. Using Response Functions to Measure Strategy Strength , 2014, AAAI.

[15] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[16] Michael H. Bowling,et al. Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.

[17] Oskari Tammelin,et al. Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[18] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[19] Bernhard von Stengel,et al. Algorithms for abstracting and solving imperfect information games , 2007 .

[20] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[21] Michael H. Bowling,et al. Eqilibrium Approximation Quality of Current No-Limit Poker Bots , 2016, AAAI Workshops.

[22] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[23] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[24] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..