Unsupervised Curricula for Visual Meta-Reinforcement Learning

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast and effective reinforcement learning (RL) strategies. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can ``useful'' pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner’s data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. Moreover, we show how discriminative clustering frameworks for visual representations can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding the pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that both transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient meta-learning of test task distributions.

[1]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[2]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[3]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[5]  Justin Fu,et al.  EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[6]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[9]  J. Piaget The construction of reality in the child , 1954 .

[10]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[11]  Sergey Levine,et al.  Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[12]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[13]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[14]  Albin Cassirer,et al.  Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[15]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[16]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[17]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[18]  Ladislau Bölöni,et al.  Unsupervised Meta-Learning For Few-Shot Image and Video Classification , 2018, ArXiv.

[19]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[20]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[21]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[22]  R. W. White Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[23]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[24]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[25]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[26]  Armand Joulin,et al.  Unsupervised Learning by Predicting Noise , 2017, ICML.

[27]  Jane X. Wang,et al.  Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[28]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[29]  Amos J. Storkey,et al.  Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning via Random Labels and Data Augmentation , 2019, ArXiv.

[30]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[31]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[32]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[33]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[34]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[35]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[36]  Sreekanth H. Chalasani,et al.  Maximally informative foraging by Caenorhabditis elegans , 2014, eLife.

[37]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[38]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[39]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[40]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[41]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[43]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[45]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[47]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[48]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[49]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[50]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[51]  Jürgen Schmidhuber,et al.  Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[52]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[53]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[54]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[55]  Christoph Salge,et al.  Empowerment - an Introduction , 2013, ArXiv.

[56]  Sergey Levine,et al.  Unsupervised Learning via Meta-Learning , 2018, ICLR.

[57]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[58]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[60]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[61]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[62]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[63]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[64]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[65]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.