论文信息 - Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours

Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours

Current model free learning-based robot grasping approaches exploit human-labeled datasets for training the models. However, there are two problems with such a methodology: (a) since each object can be grasped in multiple ways, manually labeling grasp locations is not a trivial task; (b) human labeling is biased by semantics. While there have been attempts to train robots using trial-and-error experiments, the amount of data used in such experiments remains substantially low and hence makes the learner prone to over-fitting. In this paper, we take the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts. This allows us to train a Convolutional Neural Network (CNN) for the task of predicting grasp locations without severe overfitting. In our formulation, we recast the regression problem to an 18-way binary classification over image patches. We also present a multi-stage learning approach where a CNN trained in one stage is used to collect hard negatives in subsequent stages. Our experiments clearly show the benefit of using large-scale datasets (and multi-stage training) for the task of grasping. We also compare to several baselines and show state-of-the-art performance on generalization to unseen objects for grasping.

Abhinav Gupta | Lerrel Pinto | A. Gupta | Lerrel Pinto

[1] R. Brooks. Planning Collision- Free Motions for Pick-and-Place Operations , 1983 .

[2] Van-Duc Nguyen,et al. Constructing force-closure grasps , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[3] Matthew T. Mason,et al. Mechanics and Planning of Manipulator Pushing Operations , 1986 .

[4] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[5] Tomás Lozano-Pérez,et al. Task-level planning of pick-and-place robot motions , 1989, Computer.

[6] Ieee Robotics,et al. IEEE robotics & automation magazine , 1994 .

[7] Karun B. Shimoga,et al. Robot Grasp Synthesis Algorithms: A Survey , 1996, Int. J. Robotics Res..

[8] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[9] Vijay Kumar,et al. Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[10] Henrik I. Christensen,et al. Automatic grasp planning using shape primitives , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[11] Antonio Morales,et al. Using Experience for Assessing Grasp Reliability , 2004, Int. J. Humanoid Robotics.

[12] Peter K. Allen,et al. Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[13] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14] Ashutosh Saxena,et al. Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[15] N. Kruger,et al. Learning object-specific grasp affordance densities , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[16] Matei T. Ciocarlie,et al. The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[17] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[18] Takeo Kanade,et al. Automated Construction of Robotic Manipulation Programs , 2010 .

[19] Ashutosh Saxena,et al. Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[20] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21] Peter K. Allen,et al. Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23] Francesc Moreno-Noguer,et al. Using depth and appearance features for informed robot grasping of highly wrinkled clothes , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24] Jimmy A. Jørgensen,et al. VisGraB: A benchmark for vision-based grasping , 2012, Paladyn J. Behav. Robotics.

[25] Manuel Lopes,et al. Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[26] Danica Kragic,et al. Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[27] J. Andrew Bagnell,et al. Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[28] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[30] Siddhartha S. Srinivasa,et al. A data-driven statistical framework for post-grasp manipulation , 2014, Int. J. Robotics Res..

[31] Honglak Lee,et al. Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[32] Abdeslam Boularias,et al. Learning to Manipulate Unknown Objects in Clutter by Reinforcement , 2015, AAAI.

[33] Jeannette Bohg,et al. Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34] Nolan Wagener,et al. Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[35] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[36] Joseph Redmon,et al. Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..