Reinforcement learning in board games

This project investigates the application of the TD(λ) reinforcement learning algorithm and neural networks to the problem of producing an agent that can play board games. It provides a survey of the progress that has been made in this area over the last decade and extends this by suggesting some new possibilities for improvements (based upon theoretical and past empirical evidence). This includes the identification and a formalization (for the first time) of key game properties that are important for TD-Learning and a discussion of different methods of generate training data. Also included is the development of a TD-learning game system (including a game-independent benchmarking engine) which is capable of learning any zero-sum two-player board game. The primary purpose of the development of this system is to allow potential improvements of the system to be tested and compared in a standardized fashion. Experiments have been conduct with this system using the games Tic-Tac-Toe and Connect 4 to examine a number of different potential improvements.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  H. J. van den Herik,et al.  A knowledge-based approach to connect-four: The game is solved! , 1989 .

[3]  Charles L. Isbell,et al.  Explorations of the practical issues of learning prediction-control tasks using temporal difference learning methods , 1992 .

[4]  Daniel Olson Learning to Play Games From Experience: An Application of Artificial Neural Networks and Temporal Di , 1993 .

[5]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.

[6]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[7]  Anton V. Leouski,et al.  Learning of Position Evaluation in the Game of Othello , 1995 .

[8]  M. A. Wiering TD Learning of Game Evaluation Functions with Hierarchies Neural Architectures , 1995 .

[9]  Andrew W. Moore,et al.  Learning Evaluation Functions for Large Acyclic Domains , 1996, ICML.

[10]  Donald F. Beal,et al.  Learning Piece Values Using Temporal Differences , 1997, J. Int. Comput. Games Assoc..

[11]  Michael Buro Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello , 1997 .

[12]  Andrew Tridgell,et al.  KnightCap: A chess program that learns by combining TD( ) with minimax search , 1997, ICML 1997.

[13]  Shin Ishii,et al.  Strategy Acquisition for the Game "Othello" Based on Reinforcement Learning , 1999, ICONIP.

[14]  Nikhil Deshpande,et al.  Temporal Difference Learning in Chinese Chess , 1998, IEA/AIE.

[15]  TDLeaf ( ) : Combining Temporal Difference Learning with Game-Tree Search , 1998 .

[16]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[17]  Donald F. Beal,et al.  Temporal Difference Learning for Heuristic Search and Game Playing , 2000, Inf. Sci..

[18]  Donald F. Beal,et al.  Temporal difference learning applied to game playing and the results of application to Shogi , 2001, Theor. Comput. Sci..

[19]  Dimitrios Kalles,et al.  On verifying game designs and playing strategies using reinforcement learning , 2001, SAC.

[20]  Terrence J. Sejnowski,et al.  Learning to evaluate Go positions via temporal difference methods , 2001 .

[21]  Jonathan Schaeffer,et al.  Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[22]  Bryan B McQuade Machine Learning and the Game of Go , 2001 .

[23]  Fredrik A. Dahl,et al.  Honte, a go-playing program using neural nets , 2001 .

[24]  H. Jaap van den Herik,et al.  Learning in Lines of Action , 2002, GAME-ON.

[25]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[26]  H. Jaap van den Herik,et al.  Games solved: Now and in the future , 2002, Artif. Intell..

[27]  M. Wiering,et al.  Learning to play chess using TD ( λ )-learning with database games , 2003 .

[28]  Lambert Schomaker,et al.  Dedicated TD-learning for Stronger Gameplay: applications to Go , 2004 .

[29]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[30]  Wiering,et al.  Learning to Play Draughts using temporal difference learning with neural networks and databases , 2004 .

[31]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[32]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.