Learning to Generalize Across Long-Horizon Tasks from Human Demonstrations

Imitation learning is an effective and safe technique to train robot policies in the real world because it does not depend on an expensive random exploration process. However, due to the lack of exploration, learning policies that generalize beyond the demonstrated behaviors is still an open challenge. We present a novel imitation learning framework to enable robots to 1) learn complex real world manipulation tasks efficiently from a small number of human demonstrations, and 2) synthesize new behaviors not contained in the collected demonstrations. Our key insight is that multi-task domains often present a latent structure, where demonstrated trajectories for different tasks intersect at common regions of the state space. We present Generalization Through Imitation (GTI), a two-stage offline imitation learning algorithm that exploits this intersecting structure to train goal-directed policies that generalize to unseen start and goal state combinations. In the first stage of GTI, we train a stochastic policy that leverages trajectory intersections to have the capacity to compose behaviors from different demonstration trajectories together. In the second stage of GTI, we collect a small set of rollouts from the unconditioned stochastic policy of the first stage, and train a goal-directed agent to generalize to novel start and goal configurations. We validate GTI in both simulated domains and a challenging long-horizon robotic manipulation domain in the real world. Additional results and videos are available at this https URL .

[1]  Pushmeet Kohli,et al.  CompILE: Compositional Imitation Learning and Execution , 2018, ICML.

[2]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[3]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[4]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[5]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[6]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[7]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[8]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Byron Boots,et al.  IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[13]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[14]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[16]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[17]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[18]  Devesh K. Jha,et al.  Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control , 2019, ArXiv.

[19]  Ion Stoica,et al.  Learning to Optimize Join Queries With Deep Reinforcement Learning , 2018, ArXiv.

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[23]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[24]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[25]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[26]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[27]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[28]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[29]  Peter Englert,et al.  Learning manipulation skills from a single demonstration , 2018, Int. J. Robotics Res..

[30]  Sergey Levine,et al.  AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos , 2020, Robotics: Science and Systems.

[31]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Silvio Savarese,et al.  Deep Local Trajectory Replanning and Control for Robot Navigation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[33]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[34]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[35]  Alexei A. Efros,et al.  Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[36]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[37]  Silvio Savarese,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.