Evaluation of Game Tree Search Methods by Game Records

This paper presents a method of evaluating game tree search methods including standard min-max search with heuristic evaluation functions and Monte Carlo tree search, which recently achieved drastic improvements in the strength of Computer Go programs. The basic idea of this paper is to use an averaged win probability of positions having similar evaluation values. Accuracy measures of evaluation values with respect to win probabilities can be used to assess the performance of game tree search methods. A plot of win probabilities against evaluation values should have consistency and monotonicity if the evaluation values are produced by a good game tree search method. By inspecting whether the plot has the properties for some subset of positions, we can detect specific deficiencies in the game tree search method. We applied our method to Go, Shogi, and Chess, and by comparing the results with empirical understanding of the performance of various game tree search methods and with the results of self-plays, we show that our method is efficient and effective.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  Bruce Abramson,et al.  Expected-Outcome: A General Model of Static Evaluation , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[4]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[5]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[6]  Matthew L. Ginsberg,et al.  GIB: Steps Toward an Expert-Level Bridge-Playing Program , 1999, IJCAI.

[7]  Jack van Rijswijck,et al.  Learning from Perfection. A Data Mining Approach to Evaluation Function Learning in Awari , 2000, Computers and Games.

[8]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[9]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[10]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[11]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[12]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[13]  Michael Buro,et al.  Evaluation Function Tuning via Ordinal Correlation , 2003, ACG.

[14]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[15]  Rich Caruana,et al.  An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics , 2005 .

[16]  Michael Buro,et al.  Tuning evaluation functions by maximizing concordance , 2005, Theor. Comput. Sci..

[17]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[18]  Akihiro Kishimoto,et al.  Monte Carlo Go Has a Way to Go , 2006, AAAI.

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[21]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[22]  Kazunori Yamaguchi,et al.  Visualization and Adjustment of Evaluation Functions Based on Evaluation Values and Win Probability , 2007, AAAI.

[23]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[24]  Kazunori Yamaguchi,et al.  Evaluation of Monte Carlo tree search and the application to Go , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[25]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[26]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[27]  H. Jaap van den Herik,et al.  Cross-Entropy for Monte-Carlo Tree Search , 2008, J. Int. Comput. Games Assoc..