Group Equivariant Deep Reinforcement Learning

In Reinforcement Learning (RL), Convolutional Neural Networks(CNNs) have been successfully applied as function approximators in Deep Q-Learning algorithms, which seek to learn action-value functions and policies in various environments. However, to date, there has been little work on the learning of symmetry-transformation equivariant representations of the input environment state. In this paper, we propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation. We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment while requiring fewer parameters. Additionally, we show that they are robust to changes in the environment caused by affine transformations.

[1]  Walid Saad,et al.  Deep Reinforcement Learning for Interference-Aware Path Planning of Cellular-Connected UAVs , 2018, 2018 IEEE International Conference on Communications (ICC).

[2]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[3]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[4]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[5]  Maurice Weiler,et al.  Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Daniel Guo,et al.  Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[7]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8]  Zeb Kurth-Nelson,et al.  Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[9]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[11]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[12]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[13]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[14]  Wanquan Liu,et al.  Geometric Reinforcement Learning for Path Planning of UAVs , 2015, J. Intell. Robotic Syst..

[15]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[16]  Louis Kirsch,et al.  Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.

[17]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Max Welling,et al.  Steerable CNNs , 2016, ICLR.

[20]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[21]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[22]  Maurice Weiler,et al.  General E(2)-Equivariant Steerable CNNs , 2019, NeurIPS.

[23]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.