ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation

Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modest-sized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to address this challenge. RoboTurk is a crowdsourcing platform for high quality 6-DoF trajectory based teleoperation through the use of widely available mobile devices (e.g. iPhone). We evaluate RoboTurk on three manipulation tasks of varying timescales (15-120s) and observe that our user interface is statistically similar to special purpose hardware such as virtual reality controllers in terms of task completion times. Furthermore, we observe that poor network conditions, such as low bandwidth and high delay links, do not substantially affect the remote users' ability to perform task demonstrations successfully on RoboTurk. Lastly, we demonstrate the efficacy of RoboTurk through the collection of a pilot dataset; using RoboTurk, we collected 137.5 hours of manipulation data from remote workers, amounting to over 2200 successful task demonstrations in 22 hours of total system usage. We show that the data obtained through RoboTurk enables policy learning on multi-step manipulation tasks with sparse rewards and that using larger quantities of demonstrations during policy learning provides benefits in terms of both learning consistency and final performance. For additional results, videos, and to download our pilot dataset, visit $\href{this http URL}{\texttt{roboturk.stanford.edu}}$

[1]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2]  Ken Goldberg,et al.  Beyond the Web: Excavating the Real World via Mosaic , 1994 .

[3]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[4]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[5]  Mark W. Spong,et al.  Bilateral teleoperation: An historical survey , 2006, Autom..

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[8]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[9]  Sarah Osentoski Crowdsourcing for closed-loop control , 2010 .

[10]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[11]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[12]  Rüdiger Dillmann,et al.  The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics , 2012, Int. J. Robotics Res..

[13]  Siddhartha S. Srinivasa,et al.  Online customization of teleoperation interfaces , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[14]  Maya Cakmak,et al.  Keyframe-based Learning from Demonstration , 2012, Int. J. Soc. Robotics.

[15]  Andrea Lockerd Thomaz,et al.  Novel Interaction Strategies for Learning from Teleoperation , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[16]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Leila Takayama,et al.  Strategies for human-in-the-loop robotic grasping , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  A. Fischer Inverse Reinforcement Learning , 2012 .

[19]  Hari Balakrishnan,et al.  Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks , 2013, NSDI.

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Ashutosh Saxena,et al.  Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[22]  Sergey Levine,et al.  Learning dexterous manipulation for a soft robotic hand from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[24]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[25]  Brijen Thananjeyan,et al.  SWIRL: A SequentialWindowed Inverse Reinforcement Learning Algorithm for Robot Tasks With Delayed Rewards , 2016, Workshop on the Algorithmic Foundations of Robotics.

[26]  Kuan-Ting Yu,et al.  More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[29]  David Whitney,et al.  Comparing Robot Grasping Teleoperation Across Desktop and Virtual Reality with ROS Reality , 2017, ISRR.

[30]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[31]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[32]  Michael Laskey,et al.  Using dVRK teleoperation to facilitate deep learning of automation tasks for an industrial robot , 2017, 2017 13th IEEE Conference on Automation Science and Engineering (CASE).

[33]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[34]  Sonia Chernova,et al.  A Comparison of Remote Robot Teleoperation Interfaces for General Object Manipulation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[35]  Anca D. Dragan,et al.  Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[37]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[38]  Daniela Rus,et al.  Baxter's Homunculus: Virtual Reality Spaces for Teleoperation in Manufacturing , 2017, IEEE Robotics and Automation Letters.

[39]  Silvio Savarese,et al.  Learning task-oriented grasping for tool manipulation from simulated self-supervision , 2018, Robotics: Science and Systems.

[40]  Joan Bruna,et al.  Backplay: "Man muss immer umkehren" , 2018, ArXiv.

[41]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[42]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[44]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[45]  Brijen Thananjeyan,et al.  SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards , 2018, Int. J. Robotics Res..