论文信息 - Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes

Analysis of Evaluation-Function Learning by Comparison of Sibling Nodes

This paper discusses gradients of search values with a parameter vector θ in an evaluation function. Recent learning methods for evaluation functions in computer shogi are based on minimization of an objective function with search results. The gradients of the evaluation function at the leaf position of a principal variation (PV) are used to make an easy substitution of the gradients of the search result. By analyzing the variations of the min-max value, we show (1) when the min-max value is partially differentiable and (2) how the substitution may introduce errors. Experiments on a shogi program with about a million parameters show how frequently such errors occur, as well as how effective the substitutions for parameter tuning are in practice.

Tomoyuki Kaneko | Kunihito Hoki

[1] 保木邦仁. Optimal control of minimax search results to learn positional evaluation , 2006 .

[2] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[3] T. A. Marsland,et al. Evaluation-Function Factors , 1985, J. Int. Comput. Games Assoc..

[4] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[5] Gerald Tesauro,et al. Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[6] Johannes Fürnkranz,et al. Machines that learn to play games , 2001 .

[7] Hiroyuki Iida,et al. Computer shogi , 2002, Artif. Intell..

[8] Donald F. Beal,et al. Temporal difference learning applied to game playing and the results of application to Shogi , 2001, Theor. Comput. Sci..

[9] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.

[10] Gerald Tesauro,et al. Comparison training of chess evaluation functions , 2001 .

[11] Johannes Fürnkranz,et al. Machine learning in games: a survey , 2001 .

[12] Tom Elliott Fawcett. Feature discovery for problem solving systems , 1993 .

[13] Thomas S. Anantharaman,et al. Evaluation Tuning for Computer Chess: Linear Discriminant Methods , 1997, J. Int. Comput. Games Assoc..

[14] Michael Buro,et al. Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..