Mobile Networks for Computer Go

The architecture of the neural networks used in Deep Reinforcement Learning programs such as Alpha Zero or Polygames has been shown to have a great impact on the performances of the resulting playing engines. For example the use of residual networks gave a 600 ELO increase in the strength of Alpha Go. This paper proposes to evaluate the interest of Mobile Network for the game of Go using supervised learning as well as the use of a policy head and a value head different from the Alpha Zero heads. The accuracy of the policy, the mean squared error of the value, the efficiency of the networks with the number of parameters, the playing speed and strength of the trained networks are evaluated.

[1]  Tristan Cazenave Residual Networks for Computer Go , 2018, IEEE Transactions on Games.

[2]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[3]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[4]  Olivier Teytaud,et al.  Polygames: Improved Zero Learning , 2020, J. Int. Comput. Games Assoc..

[5]  Yuandong Tian,et al.  ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero , 2019, ICML.

[6]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[7]  Jacques Pitrat,et al.  Realization of a general game-playing program , 1968, IFIP Congress.

[8]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Tristan Cazenave Improved Policy Networks for Computer Go , 2017, ACG.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[12]  Yihui Ren,et al.  Deep Learning Project , 2018 .

[13]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[14]  Tristan Cazenave Spatial Average Pooling for Computer Go , 2018, CGW@IJCAI.

[15]  David J. Wu,et al.  Accelerating Self-Play Learning in Go , 2019, ArXiv.

[16]  David Barber,et al.  Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.