论文信息 - Unsupervised Curricula for Visual Meta-Reinforcement Learning

Unsupervised Curricula for Visual Meta-Reinforcement Learning

In principle, meta-reinforcement learning algorithms leverage experience across many tasks to learn fast and effective reinforcement learning (RL) strategies. However, current meta-RL approaches rely on manually-defined distributions of training tasks, and hand-crafting these task distributions can be challenging and time-consuming. Can ``useful'' pre-training tasks be discovered in an unsupervised manner? We develop an unsupervised algorithm for inducing an adaptive meta-training task distribution, i.e. an automatic curriculum, by modeling unsupervised interaction in a visual environment. The task distribution is scaffolded by a parametric density model of the meta-learner's trajectory distribution. We formulate unsupervised meta-RL as information maximization between a latent task variable and the meta-learner’s data distribution, and describe a practical instantiation which alternates between integration of recent experience into the task distribution and meta-learning of the updated tasks. Repeating this procedure leads to iterative reorganization such that the curriculum adapts as the meta-learner's data distribution shifts. Moreover, we show how discriminative clustering frameworks for visual representations can support trajectory-level task acquisition and exploration in domains with pixel observations, avoiding the pitfalls of alternatives. In experiments on vision-based navigation and manipulation domains, we show that the algorithm allows for unsupervised meta-learning that both transfers to downstream tasks specified by hand-crafted reward functions and serves as pre-training for more efficient meta-learning of test task distributions.

[1] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[2] Pieter Abbeel,et al. Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[3] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[5] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[6] Pierre-Yves Oudeyer,et al. Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning , 2017, J. Mach. Learn. Res..

[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[8] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[9] J. Piaget. The construction of reality in the child , 1954 .

[10] Jürgen Schmidhuber,et al. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[11] Sergey Levine,et al. Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[12] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[13] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[14] Albin Cassirer,et al. Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[15] Li Zhang,et al. Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[16] Daan Wierstra,et al. Variational Intrinsic Control , 2016, ICLR.

[17] Wojciech Jaskowski,et al. Model-Based Active Exploration , 2018, ICML.

[18] Ladislau Bölöni,et al. Unsupervised Meta-Learning For Few-Shot Image and Video Classification , 2018, ArXiv.

[19] Kenneth O. Stanley,et al. Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[20] Nuttapong Chentanez,et al. Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[21] Pieter Abbeel,et al. Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[22] R. W. White. Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[23] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[24] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.

[25] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[26] Armand Joulin,et al. Unsupervised Learning by Predicting Noise , 2017, ICML.

[27] Jane X. Wang,et al. Reinforcement Learning, Fast and Slow , 2019, Trends in Cognitive Sciences.

[28] H. B. Barlow,et al. Unsupervised Learning , 1989, Neural Computation.

[29] Amos J. Storkey,et al. Assume, Augment and Learn: Unsupervised Few-Shot Meta-Learning via Random Labels and Data Augmentation , 2019, ArXiv.

[30] Ruslan Salakhutdinov,et al. Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[31] Shakir Mohamed,et al. Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[32] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[33] Terrence J. Sejnowski,et al. Unsupervised Learning , 2018, Encyclopedia of GIS.

[34] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[35] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[36] Sreekanth H. Chalasani,et al. Maximally informative foraging by Caenorhabditis elegans , 2014, eLife.

[37] Pierre-Yves Oudeyer,et al. Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[38] Wojciech Jaskowski,et al. ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[39] Sergey Levine,et al. Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[40] Sergey Levine,et al. Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[41] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.

[43] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44] Alex Graves,et al. Automated Curriculum Learning for Neural Networks , 2017, ICML.

[45] John Schulman,et al. Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[46] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[47] David Barber,et al. The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[48] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[49] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[50] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[51] Jürgen Schmidhuber,et al. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes , 2008, ABiALS.

[52] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[53] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[54] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[55] Christoph Salge,et al. Empowerment - an Introduction , 2013, ArXiv.

[56] Sergey Levine,et al. Unsupervised Learning via Meta-Learning , 2018, ICLR.

[57] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[58] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Sergey Levine,et al. Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[60] Pieter Abbeel,et al. Variational Option Discovery Algorithms , 2018, ArXiv.

[61] David Warde-Farley,et al. Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[62] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[63] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[64] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[65] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.