Improved Feature Learning: A Maximum-Average-Out Deep Neural Network for the Game Go

Computer game-playing programs based on deep reinforcement learning have surpassed the performance of even the best human players. However, the huge analysis space of such neural networks and their numerous parameters require extensive computing power. Hence, in this study, we aimed to increase the network learning efficiency by modifying the neural network structure, which should reduce the number of learning iterations and the required computing power. A convolutional neural network with a maximum-average-out (MAO) unit structure based on piecewise function thinking is proposed, through which features can be effectively learned and the expression ability of hidden layer features can be enhanced. To verify the performance of the MAO structure, we compared it with the ResNet18 network by applying them both to the framework of AlphaGo Zero, which was developed for playing the game Go. The two network structures were trained from scratch using a low-cost server environment. MAO unit won eight out of ten games against the ResNet18 network. The superior performance of the MAO unit compared with the ResNet18 network is significant for the further development of game algorithms that require less computing power than those currently in use.

[1]  Xia Chen,et al.  A Stochastic Sampling Mechanism for Time-Varying Formation of Multiagent Systems With Multiple Leaders and Communication Delays , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jiahui Bai,et al.  On the Observability of Leader-Based Multiagent Systems with Fixed Topology , 2019, Complex..

[5]  Wenbing Zhao,et al.  A robust multilayer extreme learning machine using kernel risk-sensitive loss criterion , 2020, Int. J. Mach. Learn. Cybern..

[6]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[7]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[8]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1991, Machine Learning.

[9]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[10]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[13]  Stephen Tyree,et al.  Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU , 2016, ICLR.

[14]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[15]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[16]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[17]  Xiong Luo,et al.  Short-Term Wind Speed Forecasting via Stacked Extreme Learning Machine With Generalized Correntropy , 2018, IEEE Transactions on Industrial Informatics.

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[21]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[22]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[23]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[24]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[25]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[26]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[27]  Marc G. Bellemare,et al.  Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[28]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.