Enhancements in Monte Carlo tree search algorithms for biased game trees

Monte Carlo tree search (MCTS) algorithms have been applied to various domains and achieved remarkable success. However, it is relatively unclear what game properties enhance or degrade the performance of MCTS, while the largeness of search space including pruning efficiency mainly governs the performance of classical minimax search, assuming a decent evaluation function is given. Existing research has shown that the distribution of suboptimal moves and the non-uniformity of tree shape are more important than the largeness of state space in discussing the performance of MCTS. Our study showed that another property, bias in suboptimal moves, is also important, and we present an enhancement to better handle such situations. We focus on a game tree in which the game-theoretical value is even, while suboptimal moves for a player tend to contain more inferior moves than those for the opponent. We conducted experiments on a standard incremental tree model with various MCTS algorithms based on UCB1, KL-UCB, or Thompson sampling. The results showed that the bias in suboptimal moves degraded the performance of all algorithms and that our enhancement alleviated the effect caused by this property.

[1]  Richard E. Korf,et al.  Best-First Minimax Search , 1996, Artif. Intell..

[2]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[3]  Mark H. M. Winands,et al.  Quality-based Rewards for Monte-Carlo Tree Search Simulations , 2014, ECAI.

[4]  Takeshi Ito,et al.  Consultation Algorithm for Computer Shogi: Move Decisions by Majority , 2010, Computers and Games.

[5]  Nathan R. Sturtevant,et al.  Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups , 2014, 2014 IEEE Conference on Computational Intelligence and Games.

[6]  R. Ramanujan,et al.  On the Behavior of UCT in Synthetic Search Spaces , 2011 .

[7]  Michael Buro,et al.  Minimum Proof Graphs and Fastest-Cut-First Search Heuristics , 2009, IJCAI.

[8]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9]  Nicolas Jouandeau,et al.  Monte-Carlo Tree Reductions for Stochastic Games , 2014, TAAI.

[10]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[11]  Leandro Soriano Marcolino,et al.  Diversity beats strength?:towards forming a powerful team , 2013 .

[12]  Takeshi Ito,et al.  Monte-Carlo tree search in Ms. Pac-Man , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[13]  Martin Müller,et al.  Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[14]  Damien Ernst,et al.  Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[15]  金子 知適,et al.  Improvement of Performance of Monte Carlo Tree Search in Positions Where Difficulty Differs by Turns , 2013 .

[16]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[17]  Nathan R. Sturtevant,et al.  Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search , 2010, AAAI.

[18]  Petr Baudis,et al.  Balancing MCTS by Dynamically Adjusting the Komi Value , 2011, J. Int. Comput. Games Assoc..

[19]  Mark H. M. Winands,et al.  Monte-Carlo Tree Search Solver , 2008, Computers and Games.

[20]  Y. Björnsson,et al.  Game-Tree Properties and MCTS Performance , 2011 .

[21]  R. Munos,et al.  Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[22]  Dana S. Nau,et al.  An Investigation of the Causes of Pathology in Games , 1982, Artif. Intell..

[23]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[24]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[25]  Marco Platzner,et al.  On Semeai Detection in Monte-Carlo Go , 2013, Computers and Games.

[26]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.