Temporal Difference Learning Applied to a High-Performance Game-Playing Program

The temporal difference (TD) learning algorithm offers the hope that the arduous task of manually tuning the evaluation function weights of game-playing programs can be automated. With one exception (TD-Gammon), TD learning has not been demonstrated to be effictive in a high-performance, world Class game-palying program. Further, there has been doubt expressed by game-program developers that learned weights could compete with the best hand-tuned weights. Chinook is the World Man-Machine tuned over 5 years. This paper shows that TD learinng is capable of competing with the best human effort.

[1]  Carl Ebeling,et al.  Measuring the Performance Potential of Chess Programs , 1990, Artif. Intell..

[2]  Thomas S. Anantharaman,et al.  A Statistical Study of Selective Min-Max Search in Computer Chess , 1991, J. Int. Comput. Games Assoc..

[3]  Paul E. Utgoff,et al.  Automatic Feature Generation for Problem Solving Systems , 1992, ML.

[4]  Michael Buro Statistical Feature Combination for the Evaluation of Game Positions , 1995, J. Artif. Intell. Res..

[5]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[6]  Donald F. Beal,et al.  Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[7]  Jonathan Schaeffer,et al.  One jump ahead - challenging human supremacy in checkers , 1997, J. Int. Comput. Games Assoc..

[8]  Andrew Tridgell,et al.  Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..

[9]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[10]  Andrew Tridgell,et al.  KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[11]  Jürg Nievergelt,et al.  The parallel search bench ZRAM and its applications , 1999, Ann. Oper. Res..

[12]  Jack van Rijswijck,et al.  Learning from Perfection. A Data Mining Approach to Evaluation Function Learning in Awari , 2000, Computers and Games.

[13]  J. W. Romein,et al.  Multigame - An Environment for Distributed Game- Tree Search , 2001 .

[14]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..