Smooth UCT Search in Computer Poker

Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information two-player games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds Applied to Trees (UCT) algorithm. Smooth UCT agents mix in their average policy during self-play and the resulting planning process resembles game-theoretic fictitious play. When applied to Kuhn and Leduc poker, Smooth UCT approached a Nash equilibrium, whereas UCT diverged. In addition, Smooth UCT outperformed UCT in Limit Texas Hold'em and won 3 silver medals in the 2014 Annual Computer Poker Competition.

[1]  G. Brown SOME NOTES ON COMPUTATION OF GAMES SOLUTIONS , 1949 .

[2]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[3]  H. Kuhn 9. A SIMPLIFIED TWO-PERSON POKER , 1951 .

[4]  J. Meigs,et al.  WHO Technical Report , 1954, The Yale Journal of Biology and Medicine.

[5]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[6]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  M. Pollack Journal of Artificial Intelligence Research: Preface , 2001 .

[9]  K. Roberts,et al.  Thesis , 2002 .

[10]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[11]  Jonathan Schaeffer,et al.  Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[12]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[13]  Michael H. Bowling,et al.  Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[14]  J. M. Bilbao,et al.  Contributions to the Theory of Games , 2005 .

[15]  David S. Leslie,et al.  Generalised weakened fictitious play , 2006, Games Econ. Behav..

[16]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[17]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[18]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[19]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[20]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[21]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[22]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[23]  David Auger,et al.  Multiple Tree for Partially Observable Monte-Carlo Tree Search , 2011, EvoApplications.

[24]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[25]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[26]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Martin Zinkevich,et al.  The Annual Computer Poker Competition , 2013, AI Mag..

[28]  Michael H. Bowling,et al.  Monte carlo sampling and regret minimization for equilibrium computation and decision-making in large extensive form games , 2013 .

[29]  Michael H. Bowling,et al.  Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.

[30]  J Heinrich,et al.  Self-play Monte-Carlo tree search in computer poker , 2014, AAAI 2014.

[31]  Michael H. Bowling,et al.  Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[32]  V. Lisý ALTERNATIVE SELECTION FUNCTIONS FOR INFORMATION SET MONTE CARLO TREE SEARCH , 2014 .

[33]  Michael H. Bowling,et al.  Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games , 2015, AAMAS.

[34]  Tuomas Sandholm,et al.  Endgame Solving in Large Imperfect-Information Games , 2015, AAAI Workshop: Computer Poker and Imperfect Information.

[35]  中山 幹夫,et al.  Games and Economic Behavior of Bounded Rationality , 2016 .