论文信息 - Group Equivariant Deep Reinforcement Learning - 字舞流文

Group Equivariant Deep Reinforcement Learning

In Reinforcement Learning (RL), Convolutional Neural Networks(CNNs) have been successfully applied as function approximators in Deep Q-Learning algorithms, which seek to learn action-value functions and policies in various environments. However, to date, there has been little work on the learning of symmetry-transformation equivariant representations of the input environment state. In this paper, we propose the use of Equivariant CNNs to train RL agents and study their inductive bias for transformation equivariant Q-value approximation. We demonstrate that equivariant architectures can dramatically enhance the performance and sample efficiency of RL agents in a highly symmetric environment while requiring fewer parameters. Additionally, we show that they are robust to changes in the environment caused by affine transformations.

Kaleem Siddiqi | Pratheeksha Nair | Arnab Kumar Mondal | Kaleem Siddiqi | A. Mondal | Arnab Kumar Mondal | Pratheeksha Nair

[1] Walid Saad,et al. Deep Reinforcement Learning for Interference-Aware Path Planning of Cellular-Connected UAVs , 2018, 2018 IEEE International Conference on Communications (ICC).

[2] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[3] Max Welling,et al. Group Equivariant Convolutional Networks , 2016, ICML.

[4] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[5] Maurice Weiler,et al. Learning Steerable Filters for Rotation Equivariant CNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[7] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[8] Zeb Kurth-Nelson,et al. Causal Reasoning from Meta-reinforcement Learning , 2019, ArXiv.

[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[11] Rémi Munos,et al. Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[12] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[13] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[14] Wanquan Liu,et al. Geometric Reinforcement Learning for Path Planning of UAVs , 2015, J. Intell. Robotic Syst..

[15] Marlos C. Machado,et al. Generalization and Regularization in DQN , 2018, ArXiv.

[16] Louis Kirsch,et al. Improving Generalization in Meta Reinforcement Learning using Learned Objectives , 2020, ICLR.

[17] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[18] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19] Max Welling,et al. Steerable CNNs , 2016, ICLR.

[20] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .

[21] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[22] Maurice Weiler,et al. General E(2)-Equivariant Steerable CNNs , 2019, NeurIPS.

[23] Daniel Guo,et al. Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.