The Global Landscape of Objective Functions for the Optimization of Shogi Piece Values with a Game-Tree Search

The landscape of an objective function for supervised learning of evaluation functions is numerically investigated for a limited number of feature variables. Despite the importance of such learning methods, the properties of the objective function are still not well known because of its complicated dependence on millions of tree-search values. This paper shows that the objective function has multiple local minima and the global minimum point indicates reasonable feature values. Moreover, the function is continuous with a practically computable numerical accuracy. However, the function has non-partially differentiable points on the critical boundaries. It is shown that an existing iterative method is able to minimize the functions from random initial values with great stability, but it has the possibility to end up with a non-reasonable local minimum point if the initial random values are far from the desired values. Furthermore, the obtained minimum points are shown to form a funnel structure.

[1]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[2]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[3]  Thomas S. Anantharaman,et al.  Evaluation Tuning for Computer Chess: Linear Discriminant Methods , 1997, J. Int. Comput. Games Assoc..

[4]  Ya-Xiang Yuan,et al.  Optimization theory and methods , 2006 .

[5]  Claude E. Shannon,et al.  Programming a computer for playing chess , 1950 .

[6]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[7]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[8]  Andrew Tridgell,et al.  TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search , 1999, ArXiv.

[9]  Gerald Tesauro,et al.  Comparison training of chess evaluation functions , 2001 .

[10]  Johannes Fürnkranz,et al.  Machine learning in games: a survey , 2001 .

[11]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[12]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[13]  Jonathan Schaeffer,et al.  Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[14]  T. A. Marsland,et al.  Evaluation-Function Factors , 1985, J. Int. Comput. Games Assoc..

[15]  Ya-Xiang Yuan,et al.  Optimization Theory and Methods: Nonlinear Programming , 2010 .

[16]  Joel Veness,et al.  Bootstrapping from Game Tree Search , 2009, NIPS.

[17]  保木 邦仁 Optimal control of minimax search results to learn positional evaluation , 2006 .

[18]  T. Anthony Marsland,et al.  Parallel Search of Strongly Ordered Game Trees , 1982, CSUR.

[19]  Fred Popowich,et al.  Parallel Game-Tree Search , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Donald F. Beal,et al.  Temporal difference learning applied to game playing and the results of application to Shogi , 2001, Theor. Comput. Sci..

[21]  Claude E. Shannon,et al.  XXII. Programming a Computer for Playing Chess 1 , 1950 .

[22]  Jonathan Baxter,et al.  TDLeaf ( ) : Combining Temporal Difference Learning with Game-Tree Search , 1998 .

[23]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .