论文信息 - Temporal Difference Learning Applied to a High-Performance Game-Playing Program

Temporal Difference Learning Applied to a High-Performance Game-Playing Program

The temporal difference (TD) learning algorithm offers the hope that the arduous task of manually tuning the evaluation function weights of game-playing programs can be automated. With one exception (TD-Gammon), TD learning has not been demonstrated to be effictive in a high-performance, world Class game-palying program. Further, there has been doubt expressed by game-program developers that learned weights could compete with the best hand-tuned weights. Chinook is the World Man-Machine tuned over 5 years. This paper shows that TD learinng is capable of competing with the best human effort.

[1] Carl Ebeling,et al. Measuring the Performance Potential of Chess Programs , 1990, Artif. Intell..

[2] Thomas S. Anantharaman,et al. A Statistical Study of Selective Min-Max Search in Computer Chess , 1991, J. Int. Comput. Games Assoc..

[3] Paul E. Utgoff,et al. Automatic Feature Generation for Problem Solving Systems , 1992, ML.

[4] Michael Buro. Statistical Feature Combination for the Evaluation of Game Positions , 1995, J. Artif. Intell. Res..

[5] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[6] Donald F. Beal,et al. Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[7] Jonathan Schaeffer,et al. One jump ahead - challenging human supremacy in checkers , 1997, J. Int. Comput. Games Assoc..

[8] Andrew Tridgell,et al. Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..

[9] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[10] Andrew Tridgell,et al. KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[11] Jürg Nievergelt,et al. The parallel search bench ZRAM and its applications , 1999, Ann. Oper. Res..

[12] Jack van Rijswijck,et al. Learning from Perfection. A Data Mining Approach to Evaluation Function Learning in Awari , 2000, Computers and Games.

[13] J. W. Romein,et al. Multigame - An Environment for Distributed Game- Tree Search , 2001 .

[14] Michael Buro,et al. Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..