论文信息 - CompILE: Compositional Imitation Learning and Execution

CompILE: Compositional Imitation Learning and Execution

We introduce Compositional Imitation Learning and Execution (CompILE): a framework for learning reusable, variable-length segments of hierarchically-structured behavior from demonstration data. CompILE uses a novel unsupervised, fully-differentiable sequence segmentation module to learn latent encodings of sequential data that can be re-composed and executed to perform new tasks. Once trained, our model generalizes to sequences of longer length and from environment instances not seen during training. We evaluate CompILE in a challenging 2D multi-task environment and a continuous control task, and show that it can find correct task boundaries and event encodings in an unsupervised manner. Latent codes and associated behavior policies discovered by CompILE can be used by a hierarchical agent, where the high-level policy selects actions in the latent code space, and the low-level, task-specific policies are simply the learned decoders. We found that our CompILE-based agent could learn given only sparse rewards, where agents without task-specific policies struggle.

[1] D. Davidson. Inquiries Into Truth and Interpretation , 1984 .

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4] Jeffrey M. Zacks,et al. Perceiving, remembering, and communicating structure in events. , 2001, Journal of experimental psychology. General.

[5] David M. Blei,et al. Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[6] T. Griffiths,et al. A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[7] L. Davachi,et al. What Constitutes an Episode in Episodic Memory? , 2011, Psychological science.

[8] Alex Graves,et al. Supervised Sequence Labelling , 2012 .

[9] Scott Niekum,et al. Incremental Semantically Grounded Learning from Demonstration , 2013, Robotics: Science and Systems.

[10] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[11] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[12] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[13] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Oliver Kroemer,et al. Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[18] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[19] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[20] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[21] Kenneth A. Norman,et al. Discovering Event Structure in Continuous Narrative Perception and Memory , 2016, Neuron.

[22] Bernard Ghanem,et al. DAPs: Deep Action Proposals for Action Understanding , 2016, ECCV.

[23] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[24] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[25] Yu Zhang,et al. Latent Sequence Decompositions , 2016, ICLR.

[26] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.

[27] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.