Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning

Deep reinforcement learning (RL) agents often fail to generalize to unseen environments (yet semantically similar to trained agents), particularly when they are trained on high-dimensional state spaces, such as images. In this paper, we propose a simple technique to improve a generalization ability of deep RL agents by introducing a randomized (convolutional) neural network that randomly perturbs input observations. It enables trained agents to adapt to new domains by learning robust features invariant across varied and randomized environments. Furthermore, we consider an inference method based on the Monte Carlo approximation to reduce the variance induced by this randomization. We demonstrate the superiority of our method across 2D CoinRun, 3D DeepMind Lab exploration and 3D robotics control tasks: it significantly outperforms various regularization and data augmentation methods for the same purpose.

[1]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents (Extended Abstract) , 2018, IJCAI.

[2]  Samy Bengio,et al.  A Study on Overfitting in Deep Reinforcement Learning , 2018, ArXiv.

[3]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[4]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[5]  Pieter Abbeel,et al.  Domain Randomization for Active Pose Estimation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Joelle Pineau,et al.  A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning , 2018, ArXiv.

[15]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[16]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[17]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[18]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[19]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[22]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[23]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[24]  Junmo Kim,et al.  Learning Not to Learn: Training Deep Neural Networks With Biased Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[28]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[29]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Quoc V. Le,et al.  Unsupervised Data Augmentation , 2019, ArXiv.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[33]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[34]  Marlos C. Machado,et al.  Generalization and Regularization in DQN , 2018, ArXiv.

[35]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[36]  Youyong Kong,et al.  Deep Direct Reinforcement Learning for Financial Signal Representation and Trading , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[38]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[39]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[41]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[44]  Yoav Goldberg,et al.  Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation , 2018, ICML.

[45]  S. Murphy,et al.  Dynamic Treatment Regimes. , 2014, Annual review of statistics and its application.

[46]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[47]  Dawn Xiaodong Song,et al.  Assessing Generalization in Deep Reinforcement Learning , 2018, ArXiv.

[48]  Albin Cassirer,et al.  Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[49]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[50]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.