Learning Multi-Objective Curricula for Deep Reinforcement Learning

Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL). They are designed to control how a DRL agent collects data, which is inspired by how humans gradually adapt their learning processes to their capabilities. For example, ACL can be used for subgoal generation, reward shaping, environment generation, or initial state generation. However, prior work only considers curriculum learning following one of the aforementioned predefined paradigms. It is unclear which of these paradigms are complementary, and how the combination of them can be learned from interactions with the environment. Therefore, in this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula that are generated by a set of parametric curriculum modules. Each curriculum module is instantiated as a neural network and is responsible for generating a particular curriculum. In order to coordinate those potentially conflicting modules in unified parameter space, we propose a multi-task hyper-net learning framework that uses a single hyper-net to parameterize all those curriculum modules. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum, which may otherwise be difficult to design manually. We evaluate our method on a series of robotic manipulation tasks and demonstrate its superiority over other state-of-the-art ACL methods in terms of sample efficiency and final performance.

[1]  Peter Stone,et al.  Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[2]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[5]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[7]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[9]  Zihan Zhou,et al.  Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning , 2020, ICLR.

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[12]  Pierre-Yves Oudeyer,et al.  Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning , 2019, ViGIL@NeurIPS.

[13]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[14]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[15]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.

[16]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[17]  Saurabh Kumar,et al.  Learning to Compose Skills , 2017, ArXiv.

[18]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[19]  David Held,et al.  Adaptive Auxiliary Task Weighting for Reinforcement Learning , 2019, NeurIPS.

[20]  Dit-Yan Yeung,et al.  A Regularization Approach to Learning Task Relationships in Multitask Learning , 2014, ACM Trans. Knowl. Discov. Data.

[21]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[22]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[23]  Joshua B. Tenenbaum,et al.  Learning with AMIGo: Adversarially Motivated Intrinsic Goals , 2020, ICLR.

[24]  Victor Lesser,et al.  ROMA: Multi-Agent Reinforcement Learning with Emergent Roles , 2020, ICML.

[25]  Abhinav Gupta,et al.  Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[27]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[28]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[29]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[31]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[32]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[34]  Peter Stone,et al.  Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[35]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[36]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[37]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[38]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[39]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[40]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[43]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[44]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[45]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[46]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[47]  Tim Salimans,et al.  Learning Montezuma's Revenge from a Single Demonstration , 2018, ArXiv.

[48]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[49]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[50]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[51]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[53]  Tamim Asfour,et al.  Model-Based Reinforcement Learning via Meta-Policy Optimization , 2018, CoRL.

[54]  Peter Stone,et al.  Automatic Curriculum Graph Generation for Reinforcement Learning Agents , 2017, AAAI.

[55]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[56]  Mo Chen,et al.  BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).