Memory-Augmented Monte Carlo Tree Search

This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real-time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate MMCTS in the game of Go. Experimental results show that MMCTS outperforms the original MCTS with the same number of simulations.

[1]  Akihiro Kishimoto,et al.  A General Solution to the Graph History Interaction Problem , 2004, AAAI.

[2]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[3]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[4]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[5]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[6]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[7]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8]  Erik Talvitie,et al.  Improving Exploration in UCT Using Local Manifolds , 2015, AAAI.

[9]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[10]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[11]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[12]  Yuandong Tian,et al.  Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.

[13]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[14]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[15]  Dale Schuurmans,et al.  Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[16]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[17]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[18]  Richard S. Sutton,et al.  Temporal-difference search in computer Go , 2012, Machine Learning.

[19]  Levente Kocsis,et al.  Transpositions and move groups in Monte Carlo tree search , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[20]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[21]  Amos J. Storkey,et al.  Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[22]  Y. Kawano Using Similar Positions to Search Game Trees , 1996 .

[23]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[24]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.