Monte Carlo Tree Search with heuristic evaluations using implicit minimax backups

Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic αβ search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.

[1]  Jos Uiterwijk,et al.  Solving Kalah , 2000, J. Int. Comput. Games Assoc..

[2]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[3]  Michael C. Fu,et al.  An Adaptive Sampling Algorithm for Solving Markov Decision Processes , 2005, Oper. Res..

[4]  Mark H. M. Winands,et al.  MIA: A World Champion LOA Program , 2006 .

[5]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[6]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[7]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[8]  Mark H. M. Winands,et al.  Monte-Carlo Tree Search Solver , 2008, Computers and Games.

[9]  Nathan R. Sturtevant,et al.  AN ANALYSIS OF UCT IN MULTI-PLAYER GAMES , 2008 .

[10]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[11]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[12]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[13]  Joel Veness,et al.  Bootstrapping from Game Tree Search , 2009, NIPS.

[14]  Olivier Teytaud,et al.  Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search , 2009, ACG.

[15]  Yngvi Björnsson,et al.  Learning Simulation Control in General Game-Playing Agents , 2010, AAAI.

[16]  Bart Selman,et al.  Understanding Sampling-based Adversarial Search Methods , 2010, UAI 2010.

[17]  Bart Selman,et al.  Understanding Sampling Style Adversarial Search Methods , 2010, UAI.

[18]  Tristan Cazenave,et al.  Score Bounded Monte-Carlo Tree Search , 2010, Computers and Games.

[19]  Julien Kloetzer,et al.  Monte-Carlo techniques : applications to the game of the Amazons , 2010 .

[20]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Lines of Action , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[21]  Olivier Teytaud,et al.  On the huge benefit of decisive moves in Monte-Carlo Tree Search algorithms , 2010, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games.

[22]  Bart Selman,et al.  On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[23]  Malte Helmert,et al.  High-Quality Policies for the Canadian Traveler's Problem , 2010, SOCS.

[24]  Mark H. M. Winands,et al.  Playout Search for Monte-Carlo Tree Search in Multi-player Games , 2011, ACG.

[25]  Mark H. M. Winands,et al.  αβ-based play-outs in Monte-Carlo Tree Search , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[26]  Bart Selman,et al.  Trade-Offs in Sampling-Based Adversarial Planning , 2011, ICAPS.

[27]  Maarten P. D. Schadd Selective search in games of different complexity , 2011 .

[28]  I. Bratko,et al.  Detecting Fortresses in Chess , 2012 .

[29]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[30]  Abdallah Saffidine Solving Games and All That , 2013 .

[31]  Mark H. M. Winands,et al.  Monte-Carlo Tree Search and minimax hybrids , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[32]  Malte Helmert,et al.  Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.

[33]  Stefan Freyr Gudmundsson,et al.  Sufficiency-Based Selection Strategy for MCTS , 2013, IJCAI.

[34]  Mark H. M. Winands,et al.  Search Policies in Multi-Player Games , 2013, J. Int. Comput. Games Assoc..

[35]  Richard Lorentz,et al.  Programming Breakthrough , 2013, Computers and Games.

[36]  Carmel Domshlak,et al.  Monte-Carlo Planning: Theoretically Fast Convergence Meets Practical Efficiency , 2013, UAI.