Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse Rewards

Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal tasks, which is of great importance in learning scalable robotic manipulation skills. However, reward engineering always requires strenuous efforts in multi-goal RL. Moreover, it will introduce inevitable bias causing the suboptimality of the final policy. The sparse reward provides a simple yet efficient way to overcome such limits. Nevertheless, it harms the exploration efficiency and even hinders the policy from convergence. In this paper, we propose a density-based curriculum learning method for efficient exploration with sparse rewards and better generalization to desired goal distribution. Intuitively, our method encourages the robot to gradually broaden the frontier of its ability along the directions to cover the entire desired goal space as much and quickly as possible. To further improve data efficiency and generality, we augment the goals and transitions within the allowed region during training. Finally, We evaluate our method on diversified variants of benchmark manipulation tasks that are challenging for existing methods. Empirical results show that our method outperforms the state-of-the-art baselines in terms of both data efficiency and success rate.

[1]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[2]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[3]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[4]  Andreas Krause,et al.  Lazier Than Lazy Greedy , 2014, AAAI.

[5]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[6]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[7]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[8]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[11]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[12]  Lei Han,et al.  Curriculum-guided Hindsight Experience Replay , 2019, NeurIPS.

[13]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[14]  Stuart J. Russell,et al.  MADE: Exploration via Maximizing Deviation from Explored Regions , 2021, NeurIPS.

[15]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[18]  Balaraman Ravindran,et al.  MaMiC: Macro and Micro Curriculum for Robotic Reinforcement Learning , 2019, AAMAS.

[19]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[20]  Jimmy Ba,et al.  Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning , 2020, ICML.

[21]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[22]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[23]  Pieter Abbeel,et al.  Mutual Information State Intrinsic Control , 2021, ICLR.

[24]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[25]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[26]  Sergey Levine,et al.  EMI: Exploration with Mutual Information , 2018, ICML.

[27]  Zongqing Lu,et al.  Generative Exploration and Exploitation , 2020, AAAI.

[28]  Sergey Levine,et al.  Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[29]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[30]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[31]  Jinwoo Shin,et al.  State Entropy Maximization with Random Encoders for Efficient Exploration , 2021, ICML.

[32]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[33]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[34]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[35]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[36]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[37]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[40]  Follow the Object: Curriculum Learning for Manipulation Tasks with Imagined Goals , 2020, ArXiv.

[41]  Pieter Abbeel,et al.  Automatic Curriculum Learning through Value Disagreement , 2020, NeurIPS.

[42]  Cordelia Schmid,et al.  Goal-Conditioned Reinforcement Learning with Imagined Subgoals , 2021, ICML.

[43]  Volker Tresp,et al.  Energy-Based Hindsight Experience Prioritization , 2018, CoRL.

[44]  Michael L. Littman,et al.  A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.

[45]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[46]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[47]  Julian Togelius,et al.  Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[48]  Yuan Zhou,et al.  Exploration via Hindsight Goal Generation , 2019, NeurIPS.

[49]  Richard A. Davis,et al.  Remarks on Some Nonparametric Estimates of a Density Function , 2011 .

[50]  Giovanni Montana,et al.  PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals , 2020, NeurIPS.

[51]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[52]  Rui Zhao,et al.  Maximum Entropy-Regularized Multi-Goal Reinforcement Learning , 2019, ICML.

[53]  Jihong Zhu,et al.  Hindsight Planner , 2020, AAMAS.

[54]  Justin Fu,et al.  EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.