论文信息 - State Entropy Maximization with Random Encoders for Efficient Exploration - 字舞流文

State Entropy Maximization with Random Encoders for Efficient Exploration

Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL). However, efficient exploration in high-dimensional observation spaces still remains a challenge. This paper presents Random Encoders for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward. In order to estimate state entropy in environments with high-dimensional observations, we utilize a k-nearest neighbor entropy estimator in the low-dimensional representation space of a convolutional encoder. In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly initialized encoder, which is fixed throughout training. Our experiments show that RE3 significantly improves the sample-efficiency of both model-free and model-based RL methods on locomotion and navigation tasks from DeepMind Control Suite and MiniGrid benchmarks. We also show that RE3 allows learning diverse behaviors without extrinsic rewards, effectively improving sample-efficiency in downstream tasks. Source code and videos are available at https: //sites.google.com/view/re3-rl.

Jinwoo Shin | Honglak Lee | Pieter Abbeel | Kimin Lee | Lili Chen | Younggyo Seo | P. Abbeel | Jinwoo Shin | Honglak Lee | Kimin Lee | Younggyo Seo | Lili Chen

[1] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[2] Sergey Levine,et al. Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[3] Pieter Abbeel,et al. Behavior From the Void: Unsupervised Active Pre-Training , 2021, ArXiv.

[4] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.

[5] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[9] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[10] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.

[11] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[12] Marcello Restelli,et al. A Policy Gradient Method for Task-Agnostic Exploration , 2020, ArXiv.

[13] Sergey Levine,et al. Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[14] Deepak Pathak,et al. Self-Supervised Exploration via Disagreement , 2019, ICML.

[15] Joelle Pineau,et al. Novelty Search in representational space for sample efficient exploration , 2020, NeurIPS.

[16] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[17] Santosh S. Vempala,et al. The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[18] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[20] Pieter Abbeel,et al. Decoupling Representation Learning from Reinforcement Learning , 2020, ICML.

[21] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22] Pieter Abbeel,et al. Planning to Explore via Self-Supervised World Models , 2020, ICML.

[23] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[24] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[25] Marlos C. Machado,et al. Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment , 2019, ArXiv.

[26] Sham M. Kakade,et al. Provably Efficient Maximum Entropy Exploration , 2018, ICML.

[27] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Adam Gaier,et al. Weight Agnostic Neural Networks , 2019, NeurIPS.

[29] Joelle Pineau,et al. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] A. Gamba,et al. Further experiments with PAPA , 1961 .

[32] Jinwoo Shin,et al. Training CNNs with Selective Allocation of Channels , 2019, ICML.

[33] Harshinder Singh,et al. Nearest Neighbor Estimates of Entropy , 2003 .

[34] Sergey Levine,et al. Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[35] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[36] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[37] J. Andrew Bagnell,et al. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[38] Douwe Kiela,et al. No Training Required: Exploring Random Encoders for Sentence Classification , 2019, ICLR.

[39] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40] Ilya Kostrikov,et al. Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ArXiv.

[41] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[42] Filip De Turck,et al. #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[43] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[44] Kibok Lee,et al. Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning , 2020, ICLR.

[45] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[46] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[47] Kilian Q. Weinberger,et al. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[49] Daniel Guo,et al. Never Give Up: Learning Directed Exploration Strategies , 2020, ICLR.

[50] Pierre-Yves Oudeyer,et al. CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[51] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[52] Wilko Schwarting,et al. Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles , 2020, ArXiv.

[53] Pieter Abbeel,et al. Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[54] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[55] Pieter Abbeel,et al. CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[56] Yuval Tassa,et al. dm_control: Software and Tasks for Continuous Control , 2020, Softw. Impacts.

[57] L. Györfi,et al. Nonparametric entropy estimation. An overview , 1997 .