Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours

Current model free learning-based robot grasping approaches exploit human-labeled datasets for training the models. However, there are two problems with such a methodology: (a) since each object can be grasped in multiple ways, manually labeling grasp locations is not a trivial task; (b) human labeling is biased by semantics. While there have been attempts to train robots using trial-and-error experiments, the amount of data used in such experiments remains substantially low and hence makes the learner prone to over-fitting. In this paper, we take the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts. This allows us to train a Convolutional Neural Network (CNN) for the task of predicting grasp locations without severe overfitting. In our formulation, we recast the regression problem to an 18-way binary classification over image patches. We also present a multi-stage learning approach where a CNN trained in one stage is used to collect hard negatives in subsequent stages. Our experiments clearly show the benefit of using large-scale datasets (and multi-stage training) for the task of grasping. We also compare to several baselines and show state-of-the-art performance on generalization to unseen objects for grasping.

[1]  R. Brooks Planning Collision- Free Motions for Pick-and-Place Operations , 1983 .

[2]  Van-Duc Nguyen,et al.  Constructing force-closure grasps , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[3]  Matthew T. Mason,et al.  Mechanics and Planning of Manipulator Pushing Operations , 1986 .

[4]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[5]  Tomás Lozano-Pérez,et al.  Task-level planning of pick-and-place robot motions , 1989, Computer.

[6]  Ieee Robotics,et al.  IEEE robotics & automation magazine , 1994 .

[7]  Karun B. Shimoga,et al.  Robot Grasp Synthesis Algorithms: A Survey , 1996, Int. J. Robotics Res..

[8]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[9]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[10]  Henrik I. Christensen,et al.  Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[11]  Antonio Morales,et al.  Using Experience for Assessing Grasp Reliability , 2004, Int. J. Humanoid Robotics.

[12]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[15]  N. Kruger,et al.  Learning object-specific grasp affordance densities , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[16]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[18]  Takeo Kanade,et al.  Automated Construction of Robotic Manipulation Programs , 2010 .

[19]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Francesc Moreno-Noguer,et al.  Using depth and appearance features for informed robot grasping of highly wrinkled clothes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  Jimmy A. Jørgensen,et al.  VisGraB: A benchmark for vision-based grasping , 2012, Paladyn J. Behav. Robotics.

[25]  Manuel Lopes,et al.  Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[26]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[27]  J. Andrew Bagnell,et al.  Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[28]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30]  Siddhartha S. Srinivasa,et al.  A data-driven statistical framework for post-grasp manipulation , 2014, Int. J. Robotics Res..

[31]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[32]  Abdeslam Boularias,et al.  Learning to Manipulate Unknown Objects in Clutter by Reinforcement , 2015, AAAI.

[33]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[36]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..