Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data

Deep neural networks have been successfully applied in learning the board games Go, chess, and shogi without prior knowledge by making use of reinforcement learning. Although starting from zero knowledge has been shown to yield impressive results, it is associated with high computationally costs especially for complex games. With this paper, we present CrazyAra which is a neural network based engine solely trained in supervised manner for the chess variant crazyhouse. Crazyhouse is a game with a higher branching factor than chess and there is only limited data of lower quality available compared to AlphaGo. Therefore, we focus on improving efficiency in multiple aspects while relying on low computational resources. These improvements include modifications in the neural network design and training configuration, the introduction of a data normalization step and a more sample efficient Monte-Carlo tree search which has a lower chance to blunder. After training on 569537 human games for 1.5 days we achieve a move prediction accuracy of 60.4%. During development, versions of CrazyAra played professional human players. Most notably, CrazyAra achieved a four to one win over 2017 crazyhouse world champion Justin Tan (aka LM Jann Lee) who is more than 400 Elo higher rated compared to the average player in our training set. Furthermore, we test the playing strength of CrazyAra on CPU against all participants of the second Crazyhouse Computer Championships 2017, winning against twelve of the thirteen participants. Finally, for CrazyAraFish we continue training our model on generated engine games. In 10 long-time control matches playing Stockfish 10, CrazyAraFish wins three games and draws one out of 10 matches.

[1]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[3]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[4]  Zhaoxiang Zhang,et al.  Rethinking ReLU to Train Better CNNs , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[5]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[6]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[7]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Richard Socher,et al.  Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.

[10]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[11]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[12]  Yi Yang,et al.  EraseReLU: A Simple Way to Ease the Training of Deep Convolution Neural Networks , 2017, ArXiv.

[13]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[14]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  David Barber,et al.  Nesterov's accelerated gradient and momentum as approximations to regularised update descent , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[17]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[18]  Mark H. M. Winands,et al.  Time Management for Monte Carlo Tree Search , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[19]  Jiri Matas,et al.  Systematic evaluation of CNN advances on the ImageNet , 2016, ArXiv.

[20]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[21]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[26]  Mark H. M. Winands,et al.  MCTS-Minimax Hybrids , 2015, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[30]  Ulrich Rüde,et al.  High performance smart expression template math libraries , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).

[31]  Michèle Sebag,et al.  The grand challenge of computer Go , 2012, Commun. ACM.

[32]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[33]  Ulrich Rüde,et al.  Expression Templates Revisited: A Performance Analysis of Current Methodologies , 2011, SIAM J. Sci. Comput..

[34]  Bart Selman,et al.  On Adversarial Search Spaces and Sampling-Based Planning , 2010, ICAPS.

[35]  Johannes Fürnkranz Machine Learning and Game Playing , 2017, Encyclopedia of Machine Learning and Data Mining.

[36]  Johannes Fürnkranz,et al.  Learning the Piece Values for Three Chess Variants , 2008, J. Int. Comput. Games Assoc..

[37]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[38]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[39]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[40]  Johannes Fürnkranz,et al.  Machine Learning in Computer Chess: The Next Generation , 1996, J. Int. Comput. Games Assoc..

[41]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[42]  Steven Skiena,et al.  An Overview of Machine Learning in Computer Chess , 1986, J. Int. Comput. Games Assoc..

[43]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..