CompILE: Compositional Imitation Learning and Execution

We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate CompILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our CompILE-based agent could learn given only sparse rewards, where agents without task-specific policies struggle.

[1]  D. Davidson Inquiries Into Truth and Interpretation , 1984 .

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4]  Jeffrey M. Zacks,et al.  Perceiving, remembering, and communicating structure in events. , 2001, Journal of experimental psychology. General.

[5]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[6]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[7]  L. Davachi,et al.  What Constitutes an Episode in Episodic Memory? , 2011, Psychological science.

[8]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[9]  Scott Niekum,et al.  Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[10]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[19]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[20]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[21]  Kenneth A. Norman,et al.  Discovering Event Structure in Continuous Narrative Perception and Memory , 2016, Neuron.

[22]  Bernard Ghanem,et al.  DAPs: Deep Action Proposals for Action Understanding , 2016, ECCV.

[23]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[24]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[25]  Yu Zhang,et al.  Latent Sequence Decompositions , 2016, ICLR.

[26]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[27]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[30]  Jeffrey M. Zacks,et al.  Event boundaries in memory and cognition , 2017, Current Opinion in Behavioral Sciences.

[31]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[32]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[33]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[34]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[35]  Chong Wang,et al.  Sequence Modeling via Segmentations , 2017, ICML.

[36]  Juan Carlos Niebles,et al.  Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[38]  Jeffrey M. Zacks,et al.  Constructing Experience: Event Models from Perception to Action , 2017, Trends in Cognitive Sciences.

[39]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Le Song,et al.  Recurrent Hidden Semi-Markov Model , 2017, ICLR.

[42]  Ion Stoica,et al.  DDCO: Discovery of Deep Continuous Options for Robot Learning from Demonstrations , 2017, CoRL.

[43]  Gerald Tesauro,et al.  Learning Abstract Options , 2018, NeurIPS.

[44]  Shimon Whiteson,et al.  TACO: Learning Task Decomposition via Temporal Alignment for Control , 2018, ICML.

[45]  Chong Wang,et al.  Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.

[46]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[47]  Stefan Bauer,et al.  Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models , 2018, NeurIPS.

[48]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[49]  Hyeonwoo Noh,et al.  Neural Program Synthesis from Diverse Demonstration Videos , 2018, ICML.

[50]  Dawn Xiaodong Song,et al.  Parametrized Hierarchical Procedures for Neural Programming , 2018, ICLR.

[51]  Joseph J. Lim,et al.  KeyIn: Discovering Subgoal Structure with Keyframe-based Video Prediction , 2019, ArXiv.

[52]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[53]  Alexei A. Efros,et al.  Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[54]  Mohit Sharma,et al.  Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information , 2018, ICLR.