Monte Carlo Tree Search with Robust Exploration

This paper presents a new Monte-Carlo tree search method that focuses on identifying the best move. UCT which minimizes the cumulative regret, has achieved remarkable success in Go and other games. However, recent studies on simple regret reveal that there are better exploration strategies. To further improve the performance, a leaf to be explored is determined not only by the mean but also by the whole reward distribution. We adopted a hybrid approach to obtain reliable distributions. A negamax-style backup of reward distributions is used in the shallower half of a search tree, and UCT is adopted in the rest of the tree. Experiments on synthetic trees show that this presented method outperformed UCT and similar methods, except for trees having uniform width and depth.

[1]  Eric B. Baum,et al.  A Bayesian Approach to Relevance in Game Playing , 1997, Artif. Intell..

[2]  Yoshimasa Tsuruoka,et al.  Regulation of exploration for simple regret minimization in Monte-Carlo tree search , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).

[3]  Tristan Cazenave,et al.  Ieee Transactions on Computational Intelligence and Ai in Games 1 Sequential Halving Applied to Trees , 2022 .

[4]  Michael Buro,et al.  Minimum Proof Graphs and Fastest-Cut-First Search Heuristics , 2009, IJCAI.

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Martin Müller,et al.  Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[7]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[8]  Jonathan Schaeffer,et al.  Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..

[9]  Petr Baudis,et al.  PACHI: State of the Art Open Source Go Program , 2011, ACG.

[10]  Marco Platzner,et al.  On Semeai Detection in Monte-Carlo Go , 2013, Computers and Games.

[11]  Mark H. M. Winands,et al.  Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search , 2014, CGW@ECAI.

[12]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[13]  Mark H. M. Winands,et al.  Monte-Carlo Tree Search Solver , 2008, Computers and Games.

[14]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[15]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[16]  Masaru Kitsuregawa,et al.  A Randomized Game-Tree Search Algorithm for Shogi Based on Bayesian Approach , 2014, PRICAI.

[17]  V. T. Rajan,et al.  Bayesian Inference in Monte-Carlo Tree Search , 2010, UAI.

[18]  Petr Baudis,et al.  Balancing MCTS by Dynamically Adjusting the Komi Value , 2011, J. Int. Comput. Games Assoc..

[19]  David Tolpin,et al.  MCTS Based on Simple Regret , 2012, AAAI.

[20]  Bart Selman,et al.  On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[21]  Tomoyuki Kaneko,et al.  Enhancements in Monte Carlo tree search algorithms for biased game trees , 2015, 2015 IEEE Conference on Computational Intelligence and Games (CIG).