Review of Deep Learning Methods in Robotic Grasp Detection

For robots to attain more general-purpose utility, grasping is a necessary skill to master. Such general-purpose robots may use their perception abilities to visually identify grasps for a given object. A grasp describes how a robotic end-effector can be arranged to securely grab an object and successfully lift it without slippage. Traditionally, grasp detection requires expert human knowledge to analytically form the task-specific algorithm, but this is an arduous and time-consuming approach. During the last five years, deep learning methods have enabled significant advancements in robotic vision, natural language processing, and automated driving applications. The successful results of these methods have driven robotics researchers to explore the use of deep learning methods in task-generalised robotic applications. This paper reviews the current state-of-the-art in regards to the application of deep learning methods to generalised robotic grasping and discusses how each element of the deep learning approach has improved the overall performance of robotic grasp detection. Several of the most promising approaches are evaluated and the most suitable for real-time grasp detection is identified as the one-shot detection method. The availability of suitable volumes of appropriate training data is identified as a major obstacle for effective utilisation of the deep learning approaches, and the use of transfer learning techniques is proposed as a potential mechanism to address this. Finally, current trends in the field and future potential research directions are discussed.

[1]  Chenguang Yang,et al.  Kinematics modeling and experimental verification of baxter robot , 2014, Proceedings of the 33rd Chinese Control Conference.

[2]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Andrew Owens,et al.  The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes? , 2017, CoRL.

[6]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using easily simulated depth images , 2017, ArXiv.

[7]  Ishai Rosenberg,et al.  End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware , 2018, Entropy.

[8]  Zhong Chen,et al.  End-to-End Airplane Detection Using Transfer Learning in Remote Sensing Images , 2018, Remote. Sens..

[9]  Roderic A. Grupen,et al.  Associating grasp configurations with hierarchical features in convolutional neural networks , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[11]  Kenneth Y. Goldberg,et al.  Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences , 2017, CoRL.

[12]  Yang Zhang,et al.  Fully Convolutional Grasp Detection Network with Oriented Anchor Box , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Tucker Hermans,et al.  Planning Multi-Fingered Grasps as Probabilistic Inference in a Learned Deep Network , 2018, ISRR.

[14]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[15]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[16]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[17]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[19]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[20]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[21]  Danchi Jiang,et al.  A Lagrangian network for kinematic control of redundant robot manipulators , 1999, IEEE Trans. Neural Networks.

[22]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[23]  Athanasios S. Polydoros,et al.  Real-time deep learning of robotic manipulator inverse dynamics , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Joseph M. Romano,et al.  The Amazon Picking Challenge , 2016, AI Mag..

[25]  Patricio A. Vela,et al.  Real-World Multiobject, Multigrasp Detection , 2018, IEEE Robotics and Automation Letters.

[26]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Sven Behnke,et al.  NimbRo picking: Versatile part handling for warehouse automation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[31]  Jacob Joseph Varley,et al.  Learning To Grasp , 2018 .

[32]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[33]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[35]  Luis Herranz,et al.  Combining Models from Multiple Sources for RGB-D Scene Recognition , 2017, IJCAI.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Hong Liu,et al.  Robot grasp detection using multimodal deep convolutional neural networks , 2016 .

[38]  Fuchun Sun,et al.  Robotic grasping recognition using multi-modal deep extreme learning machine , 2017, Multidimens. Syst. Signal Process..

[39]  Wojciech Zaremba,et al.  Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Stefan Leutenegger,et al.  Deep learning a grasp function for grasping under gripper pose uncertainty , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  José Santos-Victor,et al.  Visual learning by imitation with motor representations , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Abhinav Gupta,et al.  Learning to fly by crashing , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[43]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[44]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[45]  Patricio A. Vela,et al.  Real-world Multi-object, Multi-grasp Detection , 2018 .

[46]  Jun Morimoto,et al.  Robot Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[47]  Brahim Chaib-draa,et al.  Sparse Dictionary Learning for Identifying Grasp Locations , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Jun Wang,et al.  Recurrent neural networks for minimum infinity-norm kinematic control of redundant manipulators , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[49]  Long Cheng,et al.  Teleoperation of humanoid baxter robot using haptic feedback , 2014, 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI).

[50]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Yin Li,et al.  Learning to Grasp Without Seeing , 2018, ISER.

[52]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jitendra Malik,et al.  More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[54]  Di Guo,et al.  A hybrid deep architecture for robotic grasp detection , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[56]  Michael S. Ryoo,et al.  Learning robot activities from first-person human videos using convolutional future regression , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[57]  Javier Ruiz-del-Solar,et al.  A Survey on Deep Learning Methods for Robot Vision , 2018, ArXiv.

[58]  Dongwon Park,et al.  Classification based Grasp Detection using Spatial Transformer Network , 2018, ArXiv.

[59]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[60]  Christopher Kanan,et al.  Robotic grasp detection using deep convolutional neural networks , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[61]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[62]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[63]  Fumiya Iida,et al.  Real-World, Real-Time Robotic Grasping with Convolutional Neural Networks , 2017, TAROS.

[64]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects , 2006, NIPS.

[65]  Jun Wang,et al.  A dual neural network for kinematic control of redundant robot manipulators , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[66]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[67]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[68]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Myounghoon Jeon,et al.  Robotic Arts: Current Practices, Potentials, and Implications , 2017, Multimodal Technol. Interact..

[70]  Aude Billard,et al.  Learning from Humans , 2016, Springer Handbook of Robotics, 2nd Ed..

[71]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.