论文信息 - Pruning Playouts in Monte-Carlo Tree Search for the Game of Havannah

Pruning Playouts in Monte-Carlo Tree Search for the Game of Havannah

Monte-Carlo Tree Search (MCTS) is a popular technique for playing multi-player games. In this paper, we propose a new method to bias the playout policy of MCTS. The idea is to prune the decisions which seem “bad” (according to the previous iterations of the algorithm) before computing each playout. Thus, the method evaluates the estimated “good” moves more precisely. We have tested our improvement for the game of Havannah and compared it to several classic improvements. Our method outperforms the classic version of MCTS (with the RAVE improvement) and the different playout policies of MCTS that we have experimented.

Fabien Teytaud | Joris Duguépéroux | Julien Dehos | Ahmad Mazyad

[1] Richard J. Lorentz. Amazons Discover Monte-Carlo , 2008, Computers and Games.

[2] Ryan B. Hayward,et al. Monte Carlo Tree Search in Hex , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[3] Mykel J. Kochenderfer,et al. A Comparison of Monte Carlo Tree Search and Mathematical Optimization for Large Scale Dynamic Resource Allocation , 2014, ArXiv.

[4] Yngvi Björnsson,et al. Simulation-Based Approach to General Game Playing , 2008, AAAI.

[5] Timo Ewalds. Playing and Solving Havannah , 2012 .

[6] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[7] Olivier Teytaud,et al. Biasing Monte-Carlo Simulations through RAVE Values , 2010, Computers and Games.

[8] Hendrik Baier,et al. The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[9] Olivier Teytaud,et al. Creating an Upper-Confidence-Tree Program for Havannah , 2009, ACG.

[10] Stefan Edelkamp,et al. Monte-Carlo Tree Search for the Multiple Sequence Alignment Problem , 2015, SOCS.

[11] Tristan Cazenave. Monte-Carlo Kakuro , 2009, ACG.

[12] Fabien Teytaud,et al. Multiple Overlapping Tiles for Contextual Monte Carlo Tree Search , 2010, EvoApplications.

[13] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[14] Christopher Archibald,et al. Monte Carlo *-Minimax Search , 2013, IJCAI.

[15] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[16] David Taralla. Learning Artificial Intelligence in Large-Scale Video Games: A First Case Study with Hearthstone: Heroes of Warcraft , 2015 .

[17] H. Jaap van den Herik,et al. Investigations with Monte Carlo Tree Search for Finding Better Multivariate Horner Schemes , 2013, ICAART.

[18] Jos W. H. M. Uiterwijk,et al. Monte-Carlo Tree Search Enhancements for Havannah , 2011, ACG.

[19] Honglak Lee,et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[20] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[21] Bruno Bouzy,et al. Monte-Carlo strategies for computer Go , 2006 .

[22] Peter I. Cowling,et al. Bandits all the way down: UCB1 as a simulation policy in Monte Carlo Tree Search , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[23] J Heinrich,et al. Self-play Monte-Carlo tree search in computer poker , 2014, AAAI 2014.

[24] Cyril Fonlupt,et al. MONTE-CARLO TREE SEARCH FOR THE "MR JACK" BOARD GAME , 2015, SOCO 2015.

[25] Peter Drake. The Last-Good-Reply Policy for Monte-Carlo Go , 2009, J. Int. Comput. Games Assoc..

[26] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[27] Olivier Teytaud,et al. Intelligent Agents for the Game of Go , 2010, IEEE Computational Intelligence Magazine.

[28] Richard J. Lorentz. Improving Monte-Carlo Tree Search in Havannah , 2010, Computers and Games.

[29] Rafal Drezewski,et al. The Application of Co-evolutionary Genetic Programming and TD(1) Reinforcement Learning in Large-Scale Strategy Game VCMI , 2015, KES-AMSTA.

[30] Mark H. M. Winands,et al. N-Grams and the Last-Good-Reply Policy Applied in General Game Playing , 2012, IEEE Transactions on Computational Intelligence and AI in Games.