Giraffe: Using Deep Reinforcement Learning to Play Chess

This report presents Giraffe, a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the programmer. Unlike previous attempts using machine learning only to perform parameter-tuning on hand-crafted evaluation functions, Giraffe's learning system also performs automatic feature extraction and pattern recognition. The trained evaluation function performs comparably to the evaluation functions of state-of-the-art chess engines - all of which containing thousands of lines of carefully hand-crafted pattern recognizers, tuned over many years by both computer chess experts and human chess masters. Giraffe is the most successful attempt thus far at using end-to-end machine learning to play chess.

[1]  Dana S. Nau,et al.  An Analysis of Forward Pruning , 1994, AAAI.

[2]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[3]  Ernst A. Heinz Adaptive Null-Move Pruning , 1999, J. Int. Comput. Games Assoc..

[4]  Herbert A. Simon,et al.  Computer Science as Empirical Inquiry , 2011 .

[5]  Moshe Sipper,et al.  GP-EndChess: Using Genetic Programming to Evolve Chess Endgame Players , 2005, EuroGP.

[6]  Feng-Hsiung Hsu,et al.  Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[7]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[8]  Claude E. Shannon,et al.  XXII. Programming a Computer for Playing Chess 1 , 1950 .

[9]  Ernst A. Heinz Extended Futility Pruning , 1998, J. Int. Comput. Games Assoc..

[10]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[11]  Martin A. Riedmiller,et al.  RPROP - A Fast Adaptive Learning Algorithm , 1992 .

[12]  Hermann Kaindl,et al.  Searching to Variable Depth in Computer Chess , 1983, IJCAI.

[13]  Murray Campbell,et al.  Experiments with the Null-Move Heuristic , 1990 .

[14]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[15]  Hervé Luga,et al.  Genetically programmed strategies for chess endgame , 2006, GECCO.

[16]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[17]  Andrew Tridgell,et al.  TDLeaf(lambda): Combining Temporal Difference Learning with Game-Tree Search , 1999, ArXiv.

[18]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[19]  Shirish Chinchalkar,et al.  An Upper Bound for the Number of Reachable Positions , 1996, J. Int. Comput. Games Assoc..

[20]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[21]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[22]  Nathan S. Netanyahu,et al.  Genetic algorithms for mentor-assisted evaluation function optimization , 2008, GECCO '08.

[23]  Ernst A. Heinz How Darkthought Plays Chess , 1997, J. Int. Comput. Games Assoc..

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[26]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.

[27]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[28]  Claude E. Shannon,et al.  Programming a computer for playing chess , 1950 .

[29]  L. V. Allis,et al.  Searching for solutions in games and artificial intelligence , 1994 .

[30]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search , 1976, CACM.

[31]  Joel Veness,et al.  Bootstrapping from Game Tree Search , 2009, NIPS.

[32]  Andrew Tridgell,et al.  Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..

[33]  Masakazu Muramatsu,et al.  Efficiency of three forward-pruning techniques in shogi: Futility pruning, null-move pruning, and Late Move Reduction (LMR) , 2012, Entertain. Comput..