Convolutional Residual Network for Grasp Localization

Object grasping is an important ability for carrying out complex manipulation tasks with autonomous robotic systems. The grasp localization module plays an essential role in the success of the grasp maneuver. Generally viewed as a vision perception problem, its goal is determining regions of high graspability by interpreting light and depth information. Over the past few years, several works in Deep Learning (DL) have shown the high potential of Convolutional Neural Networks (CNNs) for solving vision-related problems. Advances in residual networks have further facilitated neural network training by improving convergence time and generalization performances with identity skip connections and residual mappings. In this paper, we investigate the use of residual networks for grasp localization. A standard residual CNN for object recognition uses a global average pooling layer prior to the fully-connected layers. Our experiments have shown that this pooling layer removes the spatial correlation in the back-propagated error signal, and this prevents the network from correctly localizing good grasp regions. We propose an architecture modification that removes this limitation. Our experiments on the Cornell task have shown that our network obtained state-of-the-art performances of 10.85% and 11.86% rectangle metric error on image-wise and object-wise splits respectively. We did not use pre-training but rather opted for on-line data augmentation for managing overfitting. In comparison to previous approach that employed off-line data augmentation, our network used 15x fewer observations, which significantly reduced training time.

[1]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[2]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[3]  Jason Jianjun Gu,et al.  Robotic grasp detection using extreme learning machine , 2015, 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Stefan Ulbrich,et al.  OpenGRASP: A Toolkit for Robot Grasping Simulation , 2010, SIMPAR.

[6]  Danica Kragic,et al.  Learning a dictionary of prototypical grasp-predicting parts from grasping experience , 2013, 2013 IEEE International Conference on Robotics and Automation.

[7]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[8]  Quoc V. Le,et al.  Learning to grasp objects with multiple contact points , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[10]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[13]  Oliver Brock,et al.  Lessons from the Amazon Picking Challenge: Four Aspects of Building Robotic Systems , 2016, Robotics: Science and Systems.

[14]  Hong Liu,et al.  Robot grasp detection using multimodal deep convolutional neural networks , 2016 .

[15]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[16]  Brahim Chaib-draa,et al.  Sparse Dictionary Learning for Identifying Grasp Locations , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[25]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[26]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).