Learning to Play Board Games using Temporal Dierence Methods

A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three dieren t methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an expert program, and (3) Learning from viewing experts play against themselves. Although the third possibility generates highquality games from the start compared to initial random games generated by self-play, the drawback is that the learning program is never allowed to test moves which it prefers. We compared these three methods using temporal dierence methods to learn the game of backgammon. For particular games such as draughts and chess, learning from a large database containing games played by human experts has as a large advantage that during the generation of (useful) training games, no expensive lookahead planning is necessary for move selection. Experimental results in this paper show how useful this method is for learning to play chess and draughts.

[1]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[2]  Edwin D. de Jong,et al.  Ideal Evaluation from Coevolution , 2004, Evolutionary Computation.

[3]  Jonathan Schaeffer,et al.  CHINOOK: The World Man-Machine Checkers Champion , 1996, AI Mag..

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Berliner,et al.  Search vs. knowledge : an analysis from the domain of games , 1981 .

[7]  Alessandro Sperduti,et al.  Speed up learning and network optimization with extended back propagation , 1993, Neural Networks.

[8]  Jordan B. Pollack,et al.  Why did TD-Gammon Work? , 1996, NIPS.

[9]  Johannes Fürnkranz,et al.  Machine Learning in Computer Chess: The Next Generation , 1996, J. Int. Comput. Games Assoc..

[10]  Dap Hartmann,et al.  MACHINES THAT LEARN TO PLAY GAMES , 2002 .

[11]  Jürgen Schmidhuber,et al.  Fast Online Q(λ) , 1998, Machine Learning.

[12]  John N. Tsitsiklis,et al.  Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[13]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[14]  Jonathan Schaeffer,et al.  Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[15]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[16]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[17]  Aske Plaat,et al.  Research, Re: Search and Re-Search , 1996, J. Int. Comput. Games Assoc..

[18]  Hans J. Berliner,et al.  Experiences in Evaluation with BKG - A Program that Plays Backgammon , 1977, IJCAI.

[19]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[20]  Donald F. Beal,et al.  Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[21]  Sebastian Thrun,et al.  Explanation Based Learning: A Comparison of Symbolic and Neural Network Approaches , 1993, ICML.

[22]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[23]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[24]  David E. Moriarty,et al.  Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[25]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[27]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[28]  Rich Caruana,et al.  Promoting Poor Features to Supervisors: Some Inputs Work Better as Outputs , 1996, NIPS.

[29]  David B. Fogel,et al.  Evolving a checkers player without relying on human experience , 2000, INTL.

[30]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[31]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[32]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[33]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[34]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[35]  Justin A. Boyan,et al.  Modular Neural Networks for Learning Context-Dependent Game Strategies , 2007 .

[36]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[39]  A. D. D. Groot Thought and Choice in Chess , 1978 .

[40]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[41]  Andrew Tridgell,et al.  KnightCap: A chess program that learns by combining TD( ) with minimax search , 1997, ICML 1997.

[42]  A. L. Samuel,et al.  Some studies in machine learning using the game of checkers. II: recent progress , 1967 .

[43]  Jonathan Schaeffer,et al.  Kasparov versus Deep Blue: The Rematch , 1997, J. Int. Comput. Games Assoc..

[44]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[45]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.

[46]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[47]  Andrew Tridgell,et al.  KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.