论文信息 - Smooth UCT Search in Computer Poker

Smooth UCT Search in Computer Poker

Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information two-player games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds Applied to Trees (UCT) algorithm. Smooth UCT agents mix in their average policy during self-play and the resulting planning process resembles game-theoretic fictitious play. When applied to Kuhn and Leduc poker, Smooth UCT approached a Nash equilibrium, whereas UCT diverged. In addition, Smooth UCT outperformed UCT in Limit Texas Hold'em and won 3 silver medals in the 2014 Annual Computer Poker Competition.

[1] G. Brown. SOME NOTES ON COMPUTATION OF GAMES SOLUTIONS , 1949 .

[2] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[3] H. Kuhn. 9. A SIMPLIFIED TWO-PERSON POKER , 1951 .

[4] J. Meigs,et al. WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[5] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[6] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[7] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[8] M. Pollack. Journal of Artificial Intelligence Research: Preface , 2001 .

[9] K. Roberts,et al. Thesis , 2002 .

[10] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..

[11] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[12] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[14] J. M. Bilbao,et al. Contributions to the Theory of Games , 2005 .

[15] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..

[16] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[17] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[18] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[19] Kevin Waugh,et al. Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[20] Tuomas Sandholm,et al. The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[21] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[22] Marc Lanctot,et al. Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[23] David Auger,et al. Multiple Tree for Partially Observable Monte-Carlo Tree Search , 2011, EvoApplications.

[24] Peter I. Cowling,et al. Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[25] Michèle Sebag,et al. The grand challenge of computer Go , 2012, Commun. ACM.

[26] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[27] Martin Zinkevich,et al. The Annual Computer Poker Competition , 2013, AI Mag..

[28] Michael H. Bowling,et al. Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .

[29] Michael H. Bowling,et al. Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.

[30] J Heinrich,et al. Self-play Monte-Carlo tree search in computer poker , 2014, AAAI 2014.

[31] Michael H. Bowling,et al. Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[32] V. Lisý. ALTERNATIVE SELECTION FUNCTIONS FOR INFORMATION SET MONTE CARLO TREE SEARCH , 2014 .

[33] Michael H. Bowling,et al. Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games , 2015, AAMAS.

[34] Tuomas Sandholm,et al. Endgame Solving in Large Imperfect-Information Games , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[35] 中山幹夫,et al. Games and Economic Behavior of Bounded Rationality , 2016 .