Evaluation of Monte Carlo tree search and the application to Go

Recent improvements to Monte Carlo tree search have produced strong computer Go programs. This paper presents a method of measuring the accuracy of Monte Carlo tree search in game programming. We use the win percentage of positions in a large database of game records as a benchmark and compare the win probability obtained by simulations with the benchmark. By applying our method to Monte Carlo tree search in Go, we found differences between search methods and their parameters, and the effect of the properties of positions such as the move numbers and the existence of stones in threats. This paper also introduces numerical metrics to evaluate the performance of search methods. Our experiments in Go, as well as Chess, Othello, and Shogi revealed that the metrics were quite close to our empirical understanding of the performance of various search methods and their parameters.

[1]  Akihiro Kishimoto,et al.  Monte Carlo Go Has a Way to Go , 2006, AAAI.

[2]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[3]  Michael Buro,et al.  Tuning evaluation functions by maximizing concordance , 2005, Theor. Comput. Sci..

[4]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[5]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[6]  Bruce Abramson,et al.  Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[8]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[9]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[10]  Rich Caruana,et al.  An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics , 2005 .

[11]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[12]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[13]  Miha Vuk,et al.  ROC curve, lift chart and calibration plot , 2006, Advances in Methodology and Statistics.

[14]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[15]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[16]  Jack van Rijswijck,et al.  Learning from Perfection. A Data Mining Approach to Evaluation Function Learning in Awari , 2000, Computers and Games.

[17]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[18]  Kazunori Yamaguchi,et al.  Visualization and Adjustment of Evaluation Functions Based on Evaluation Values and Win Probability , 2007, AAAI.

[19]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[20]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[21]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[22]  Bruno Bouzy,et al.  Monte-Carlo Go Reinforcement Learning Experiments , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[23]  John Kirriemuir,et al.  Advances in Computer Games: Many Games, Many Challenges , 2005 .

[24]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[25]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.