Large-Scale Optimization for Evaluation Functions with Minimax Search

This paper presents a new method, Minimax Tree Optimization (MMTO), to learn a heuristic evaluation function of a practical alpha-beta search program. The evaluation function may be a linear or non-linear combination of weighted features, and the weights are the parameters to be optimized. To control the search results so that the move decisions agree with the game records of human experts, a well-modeled objective function to be minimized is designed. Moreover, a numerical iterative method is used to find local minima of the objective function, and more than forty million parameters are adjusted by using a small number of hyper parameters. This method was applied to shogi, a major variant of chess in which the evaluation function must handle a larger state space than in chess. Experimental results show that the large-scale optimization of the evaluation function improves the playing strength of shogi programs, and the new method performs significantly better than other methods. Implementation of the new method in our shogi program Bonanza made substantial contributions to the program's first-place finish in the 2013 World Computer Shogi Championship. Additionally, we present preliminary evidence of broader applicability of our method to other two-player games such as chess.

[1]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[2]  G. M. Adelson-Velskiy,et al.  Some Methods of Controlling the Tree Search in Chess Programs , 1975, Artif. Intell..

[3]  Selim G. Akl,et al.  The principal continuation and the killer heuristic , 1977, ACM '77.

[4]  Judea Pearl,et al.  SCOUT: A Simple Game-Searching Algorithm with Proven Optimal Properties , 1980, AAAI.

[5]  T. Anthony Marsland,et al.  Parallel Search of Strongly Ordered Game Trees , 1982, CSUR.

[6]  T. Nitsche,et al.  A LEARNING CHESS PROGRAM , 1982 .

[7]  Alexander Reinefeld,et al.  An Improvement to the Scout Tree Search Algorithm , 1983, J. Int. Comput. Games Assoc..

[8]  T. A. Marsland,et al.  Evaluation-Function Factors , 1985, J. Int. Comput. Games Assoc..

[9]  Jonathan Schaeffer,et al.  Experiments in Search and Knowledge , 1986, J. Int. Comput. Games Assoc..

[10]  Jonathan Schaeffer,et al.  The History Heuristic and Alpha-Beta Search Enhancements in Practice , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Donald F. Beal,et al.  A Generalised Quiescence Search Algorithm , 1990, Artif. Intell..

[12]  Albert L. Zobrist,et al.  A New Hashing Method with Application for Game Playing , 1990 .

[13]  Romeo Çollaku,et al.  Deep thought , 1991, Nature.

[14]  Tom Elliott Fawcett Feature discovery for problem solving systems , 1993 .

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  M. Buro Statistical Feature Combination for the Evaluation of Game Positions , 1995, J. Int. Comput. Games Assoc..

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  W. Altemeier,et al.  Miscellaneous , 1848, Brain Research.

[19]  Thomas S. Anantharaman,et al.  Evaluation Tuning for Computer Chess: Linear Discriminant Methods , 1997, J. Int. Comput. Games Assoc..

[20]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[21]  Ernst A. Heinz Extended Futility Pruning , 1998, J. Int. Comput. Games Assoc..

[22]  David B. Fogel,et al.  Evolving neural networks to play checkers without relying on expert knowledge , 1999, IEEE Trans. Neural Networks.

[23]  Ernst A. Heinz Adaptive Null-Move Pruning , 1999, J. Int. Comput. Games Assoc..

[24]  Robert Levinson,et al.  Chess Neighborhoods, Function Combination, and Reinforcement Learning , 2000, Computers and Games.

[25]  Johannes Fürnkranz,et al.  Machine learning in games: a survey , 2001 .

[26]  Donald F. Beal,et al.  Temporal difference learning applied to game playing and the results of application to Shogi , 2001, Theor. Comput. Sci..

[27]  Jonathan Schaeffer,et al.  Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[28]  Gerald Tesauro,et al.  Comparison training of chess evaluation functions , 2001 .

[29]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[30]  Hiroyuki Iida,et al.  Computer shogi , 2002, Artif. Intell..

[31]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[32]  T. Anthony Marsland,et al.  Learning Control of Search Extensions , 2002, JCIS.

[33]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[34]  Takashi Chikayama,et al.  Game-tree Search Algorithm based on Realization Probability , 2002, J. Int. Comput. Games Assoc..

[35]  H. Jaap van den Herik,et al.  Games solved: Now and in the future , 2002, Artif. Intell..

[36]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[37]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[38]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[39]  Michael Buro,et al.  Tuning evaluation functions by maximizing concordance , 2005, Theor. Comput. Sci..

[40]  保木 邦仁 Optimal control of minimax search results to learn positional evaluation , 2006 .

[41]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[42]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[43]  Gerald Tesauro,et al.  Monte-Carlo simulation balancing , 2009, ICML '09.

[44]  Joel Veness,et al.  Bootstrapping from Game Tree Search , 2009, NIPS.

[45]  Nathan R Sturtevant,et al.  Improving State Evaluation, Inference, and Search in Trick-Based Card Games , 2009, IJCAI.

[46]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[47]  Fritz Reul,et al.  Static Exchange Evaluation with αβ-Approach , 2010, J. Int. Comput. Games Assoc..

[48]  Tomoyuki Kaneko,et al.  Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes , 2011, ACG.

[49]  Tomoyuki Kaneko,et al.  The Global Landscape of Objective Functions for the Optimization of Shogi Piece Values with a Game-Tree Search , 2011, ACG.

[50]  Rémi Coulom,et al.  CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning , 2011, ACG.

[51]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[52]  Masakazu Muramatsu,et al.  Efficiency of three forward-pruning techniques in shogi: Futility pruning, null-move pruning, and Late Move Reduction (LMR) , 2012, Entertain. Comput..

[53]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[54]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[55]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .