Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting

An almost-perfect chess playing agent has been a long standing challenge in the field of Artificial Intelligence. Some of the recent advances demonstrate we are approaching that goal. In this project, we provide methods for faster training of self-play style algorithms, mathematical details of the algorithm used, various potential future directions, and discuss most of the relevant work in the area of computer chess. Deep Pepper uses embedded knowledge to accelerate the training of the chess engine over a "tabula rasa" system such as Alpha Zero. We also release our code to promote further research.

[1]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[2]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search , 1976, CACM.

[3]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[4]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[5]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[6]  Rémi Munos,et al.  Learning to Search with MCTSnets , 2018, ICML.

[7]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[8]  Claude E. Shannon,et al.  Programming a computer for playing chess , 1950 .

[9]  Christopher D. Rosin,et al.  Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[10]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[11]  André da Motta Salles Barreto,et al.  Classification-based Approximate Policy Iteration: Experiments and Extended Discussions , 2014, ArXiv.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  Feng-Hsiung Hsu,et al.  Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , 2002 .

[14]  Claude E. Shannon,et al.  XXII. Programming a Computer for Playing Chess 1 , 1950 .

[15]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[16]  Herbert A. Simon,et al.  Computer Science as Empirical Inquiry , 2011 .

[17]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[18]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[19]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[20]  David Barber,et al.  Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.

[21]  Matthew Lai,et al.  Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.

[22]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[23]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.