Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation

Imitation learning is a powerful paradigm for robot skill acquisition. However, obtaining demonstrations suitable for learning a policy that maps from raw pixels to actions can be challenging. In this paper we describe how consumer-grade Virtual Reality headsets and hand tracking hardware can be used to naturally teleoperate robots to perform complex tasks. We also describe how imitation learning can learn deep neural network policies (mapping from pixels to actions) that can acquire the demonstrated skills. Our experiments showcase the effectiveness of our approach for learning visuomotor skills.

[1]  George M. Siouris,et al.  Applied Optimal Control: Optimization, Estimation, and Control , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[3]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  M. Talamini,et al.  Robotic gastrointestinal surgery: early experience and system description. , 2002, Journal of laparoendoscopic & advanced surgical techniques. Part A.

[6]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[7]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[8]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[9]  Gordon Cheng,et al.  Discovering optimal imitation strategies , 2004, Robotics Auton. Syst..

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[12]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[13]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[14]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[15]  John T. Betts,et al.  Practical Methods for Optimal Control and Estimation Using Nonlinear Programming , 2009 .

[16]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[18]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[19]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[20]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[21]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[22]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[23]  C. Stanton,et al.  Teleoperation of a humanoid robot using full-body motion capture , example movements , and machine learning , 2012 .

[24]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[25]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[26]  Siddhartha S. Srinivasa,et al.  Teleoperation with intelligent and customizable interfaces , 2013, Journal of Human-Robot Interaction.

[27]  Pieter Abbeel,et al.  Learning from Demonstrations Through the Use of Non-rigid Registration , 2013, ISRR.

[28]  Russ Tedrake,et al.  A direct method for trajectory optimization of rigid bodies through contact , 2014, Int. J. Robotics Res..

[29]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Vikash Kumar,et al.  MuJoCo HAPTIX: A virtual reality system for hand manipulation , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[33]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[34]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[36]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[37]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[38]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[39]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[40]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[41]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[42]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[43]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[44]  Arkanath Pathak,et al.  Learning Grasping Interaction with Geometry-aware 3D Representations , 2017, ArXiv.

[45]  David Whitney,et al.  Communicating Robot Arm Motion Intent Through Mixed Reality Head-mounted Displays , 2017, ISRR.

[46]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[47]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[48]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[49]  Daniela Rus,et al.  Baxter's Homunculus: Virtual Reality Spaces for Teleoperation in Manufacturing , 2017, IEEE Robotics and Automation Letters.

[50]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).