Choreographer: Learning and Adapting Skills in Imagination

Unsupervised skill learning aims to learn a rich repertoire of behaviors without external supervision, providing artificial agents with the ability to control and influence the environment. However, without appropriate knowledge and exploration, skills may provide control only over a restricted area of the environment, limiting their applicability. Furthermore, it is unclear how to leverage the learned skill behaviors for adapting to downstream tasks in a data-efficient manner. We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. During adaptation, the agent uses a meta-controller to evaluate and adapt the learned skills efficiently by deploying them in parallel in imagination. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy. The skills can be used to effectively adapt to downstream tasks, as we show in the URL benchmark, where we outperform previous approaches from both pixels and states inputs. The learned skills also explore the environment thoroughly, finding sparse rewards more frequently, as shown in goal-reaching tasks from the DMC Suite and Meta-World. Project website: https://skillchoreographer.github.io/

[1]  S. Levine,et al.  A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning , 2022, ArXiv.

[2]  J. Schmidhuber,et al.  General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States , 2022, ArXiv.

[3]  Ian S. Fischer,et al.  Deep Hierarchical Planning from Pixels , 2022, NeurIPS.

[4]  Jaekyeom Kim,et al.  Lipschitz-constrained Unsupervised Skill Discovery , 2022, ICLR.

[5]  P. Abbeel,et al.  CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery , 2022, ArXiv.

[6]  P. Abbeel,et al.  Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning , 2022, ArXiv.

[7]  Deepak Pathak,et al.  Interesting Object, Curious Agent: Learning Task-Agnostic Exploration , 2021, NeurIPS.

[8]  Pieter Abbeel,et al.  URLB: Unsupervised Reinforcement Learning Benchmark , 2021, NeurIPS Datasets and Benchmarks.

[9]  Andreas Krause,et al.  Hierarchical Skills for Efficient Exploration , 2021, NeurIPS.

[10]  Oleh Rybkin,et al.  Discovering and Achieving Goals via World Models , 2021, NeurIPS.

[11]  Sergey Levine,et al.  The Information Geometry of Unsupervised Reinforcement Learning , 2021, ICLR.

[12]  Pieter Abbeel,et al.  APS: Active Pretraining with Successor Features , 2021, ICML.

[13]  Marc G. Bellemare,et al.  Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[14]  D. Corbetta Perception, Action, and Intrinsic Motivation in Infants’ Motor-Skill Development , 2021, Current Directions in Psychological Science.

[15]  Xavier Giro-i-Nieto,et al.  Unsupervised Skill-Discovery and Skill-Learning in Minecraft , 2021, ArXiv.

[16]  Gunhee Kim,et al.  Unsupervised Skill Discovery with Bottleneck Option Learning , 2021, ICML.

[17]  Marcello Restelli,et al.  Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate , 2021, AAAI.

[18]  Tim Verbelen,et al.  Curiosity-Driven Exploration via Latent Bayesian Surprise , 2021, AAAI.

[19]  P. Abbeel,et al.  Behavior From the Void: Unsupervised Active Pre-Training , 2021, NeurIPS.

[20]  Alessandro Lazaric,et al.  Reinforcement Learning with Prototypical Representations , 2021, ICML.

[21]  Florian Shkurti,et al.  Latent Skill Planning for Exploration and Transfer , 2020, ICLR.

[22]  Mohammad Norouzi,et al.  Mastering Atari with Discrete World Models , 2020, ICLR.

[23]  Pieter Abbeel,et al.  Planning to Explore via Self-Supervised World Models , 2020, ICML.

[24]  Jordi Torres,et al.  Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills , 2020, ICML.

[25]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[26]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[27]  Luisa M. Zintgraf,et al.  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.

[28]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[29]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[30]  David Warde-Farley,et al.  Fast Task Inference with Variational Intrinsic Successor Features , 2019, ICLR.

[31]  Sergey Levine,et al.  Efficient Exploration via State Marginal Matching , 2019, ArXiv.

[32]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[33]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[34]  Eliza L. Nelson,et al.  The Development of Object Construction from Infancy through Toddlerhood. , 2019, Infancy : the official journal of the International Society on Infant Studies.

[35]  Sham M. Kakade,et al.  Provably Efficient Maximum Entropy Exploration , 2018, ICML.

[36]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[37]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[38]  Pieter Abbeel,et al.  Variational Option Discovery Algorithms , 2018, ArXiv.

[39]  Aurko Roy,et al.  Theory and Experiments on Vector Quantized Autoencoders , 2018, ArXiv.

[40]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[41]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[42]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[43]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[44]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[45]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[46]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[47]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[48]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[49]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[50]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[51]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[52]  Christoph Salge,et al.  Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.

[53]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[54]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[55]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[56]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[57]  Harshinder Singh,et al.  Nearest Neighbor Estimates of Entropy , 2003 .

[58]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[59]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[60]  S SuttonRichard,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1991 .

[61]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[62]  Sai Rajeswar,et al.  Unsupervised Model-based Pre-training for Data-efficient Control from Pixels , 2022, ArXiv.

[63]  N. Heess,et al.  Entropic Desired Dynamics for Intrinsic Control , 2021, NeurIPS.

[64]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[65]  Doina Precup,et al.  Temporal abstraction in reinforcement learning , 2000, ICML 2000.