BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

In this paper, we study the problem of enabling a vision-based robotic 1 manipulation system to generalize to novel tasks, a long-standing challenge in 2 robot learning. We approach the challenge from an imitation learning perspective, 3 aiming to study how scaling and broadening the data collected can facilitate such 4 generalization. To that end, we develop an interactive and flexible imitation learn5 ing system that can learn from both demonstrations and interventions and can be 6 conditioned on different forms of information that convey the task, including pre7 trained embeddings of natural language or videos of humans performing the task. 8 When scaling data collection on a real robot to more than 100 distinct tasks, we 9 find that this system can perform 21 unseen manipulation tasks with an average 10 success rate of 44%, without any robot demonstrations for those tasks. 11

[1]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Zhuo Xu,et al.  RetinaGAN: An Object-aware Approach to Sim-to-Real Transfer , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Chitta Baral,et al.  Language-Conditioned Imitation Learning for Robot Manipulation Tasks , 2020, NeurIPS.

[4]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[6]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[7]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Paul Evrard,et al.  Learning collaborative manipulation tasks by demonstration using a haptic interface , 2009, ICAR.

[9]  Prasoon Goyal,et al.  Zero-shot Task Adaptation using Natural Language , 2021, ArXiv.

[10]  Sergey Levine,et al.  MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale , 2021, ArXiv.

[11]  Wolfram Burgard,et al.  VR-Goggles for Robots: Real-to-Sim Domain Adaptation for Visual Control , 2018, IEEE Robotics and Automation Letters.

[12]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[13]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[14]  Silvio Savarese,et al.  Neural Task Graphs: Generalizing to Unseen Tasks From a Single Video Demonstration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andrew J. Davison,et al.  Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[16]  Efstratios Gavves,et al.  Self-Supervised Video Representation Learning with Odd-One-Out Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[18]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Sudeep Dasari,et al.  Transformers for One-Shot Visual Imitation , 2020, CoRL.

[20]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[21]  Sergio Gomez Colmenarejo,et al.  One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL , 2018, ArXiv.

[22]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[23]  Ray Kurzweil,et al.  Multilingual Universal Sentence Encoder for Semantic Retrieval , 2019, ACL.

[24]  Sergey Levine,et al.  Watch, Try, Learn: Meta-Learning from Demonstrations and Reward , 2019, ICLR.

[25]  Andrew J. Davison,et al.  Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[26]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[27]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[28]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[29]  Sergey Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[30]  Kuniyuki Takahashi,et al.  Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[33]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[34]  A. Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[35]  Sanjiban Choudhury,et al.  Learning from Interventions: Human-robot interaction as both explicit and implicit feedback , 2020, Robotics: Science and Systems.

[36]  Katherine Rose Driggs-Campbell,et al.  HG-DAgger: Interactive Imitation Learning with Human Experts , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[37]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[38]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[39]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[40]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Sergey Levine,et al.  Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills , 2021, ICML.

[42]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[43]  Rajesh P. N. Rao,et al.  Learning to Walk through Imitation , 2007, IJCAI.

[44]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[45]  Pierre Sermanet,et al.  Grounding Language in Play , 2020, ArXiv.