论文信息 - First Results from Using Temporal Difference Learning in Shogi

First Results from Using Temporal Difference Learning in Shogi

This paper describes first results from the application of Temporal Difference learning [1] to shogi. We report on experiments to determine whether sensible values for shogi pieces can be obtained in the same manner as for western chess pieces [2]. The learning is obtained entirely from randomised self-play, without access to any form of expert knowledge. The piece values are used in a simple search program that chooses shogi moves from a shallow lookahead, using pieces values to evaluate the leaves, with a random tie-break at the top level. Temporal difference learning is used to adjust the piece values over the course of a series of games. The method is successful in learning values that perform well in matches against hand-crafted values.

Donald F. Beal | Martin C. Smith | D. Beal | Martin C. Smith

[1] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[2] Richard E. Korf,et al. A Unified Theory of Heuristic Evaluation Functions and its Application to Learning , 1986, AAAI.

[3] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[4] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[5] Donald F. Beal,et al. Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[6] Robert Levinson,et al. Adaptive Pattern-Oriented Chess , 1991, AAAI Conference on Artificial Intelligence.

[7] Christian Donninger,et al. Null Move and Deep Search , 1993, J. Int. Comput. Games Assoc..

[8] J. Fairbairn. Shogi for beginners , 1984 .

[9] Tony Marsland,et al. COMPUTER CHESS AND SEARCH , 1992 .

[10] Hiroyuki Iida,et al. Natural Developments in Game Research , 1996, J. Int. Comput. Games Assoc..