GI-NNet \& RGI-NNet: Development of Robotic Grasp Pose Models, Trainable with Large as well as Limited Labelled Training Datasets, under supervised and semi supervised paradigms

Grasp manipulations by COBOTS (COllaborative roBOTS) are supposed to be functioning the same way we grasp objects, even complex objects. However, for robots executing an intelligent and optimal grasp efficiently, the way we grasp objects, is quite challenging. The reason being that we acquire this skill by spending a lot of time in our childhood trying and failing to pick things up, and learning from our mistakes. For robots we can’t wait through the equivalent of an entire robotic childhood. To streamline the process, in the present investigation we propose to use deep learning techniques to help robot learn quickly how to generate and execute appropriate grasps. More specifically, we develop two models. One a Generative Inception Neural Network (GI-NNet) model, capable of generating antipodal robotic grasps on seen as well as unseen objects. It is trained on Cornell Grasping Dataset (CGD) and performed excellently by attaining 98.87% grasp pose accuracy by detecting the same from the RGB-Depth (RGB-D) images for regular as well as irregular shaped objects while it requires only one third of the network trainable parameters as compared to the State-Of-The-Art (SOTA) approach [1]. However, to attain this level of performance the model requires the entire 90% of the available labelled data of CGD keeping only remaining 10% labelled data for testing, making it vulnerable for the poor generalization. Also, as we all know, getting sufficient and quality labelled dataset is becoming increasingly difficult keeping in pace with the requirement of gigantic networks. To address these issues, we subsequently propose to attach our model as decoder with a semi-supervised learning based architecture known as Vector Quantized Variational Auto Encoder (VQVAE), which we make to work efficiently when we train it with available labelled dataset as well as unlabelled data [2]. Our proposed GI-NNet integrated VQVAE model, which we name as Representation based GI-NNet (RGI-NNet), has been trained with various splits of label data on CGD with as minimum as 10% labelled dataset together with latent embedding generated from VQVAE up to 50% labelled data with latent embedding obtained from VQVAE. The performance level, in terms of grasp pose accuracy of RGINNet, varies between 92.13% to 95.6% which is far better than many other existing SOTA models trained with only labelled dataset. For the performance verification of both GI-NNet and RGI-NNet models, we use Anukul (Baxter) hardware cobot and it is observed that both the proposed models performed significantly better in real time table top grasp executions. The logical details of our proposed models together with an in depth analyses have been presented in the paper. Keywords—Intelligent robot grasping, Generative Inception 1Student at Center of Intelligent Robotics, Indian Institute of Information Technology, Allahabad, Prayagraj, India-211015 priyashuklalko@gmail.com 2Winter intern at Center of Intelligent Robotics, Indian Institute of Information Technology, Allahabad, Prayagraj, India-211015 3Professor at Center of Intelligent Robotics, Indian Institute of Information Technology, Allahabad, Prayagraj, India-211015 gcnandi@iiita.ac.in Neural Network, Vector Quantized Variational Auto-Encoder, Representation based Generative Inception Neural Network.

[1]  Yang Zhang,et al.  Fully Convolutional Grasp Detection Network with Oriented Anchor Box , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[3]  Di Guo,et al.  A hybrid deep architecture for robotic grasp detection , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Danica Kragic,et al.  Robust Visual Servoing , 2003, Int. J. Robotics Res..

[5]  Ferat Sahin,et al.  Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Christopher Kanan,et al.  Robotic grasp detection using deep convolutional neural networks , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Tryambak Bhattacharjee,et al.  Robotic Grasp Detection By Learning Representation in a Vector Quantized Manifold , 2020, 2020 International Conference on Signal Processing and Communications (SPCOM).

[8]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Karun B. Shimoga,et al.  Robot Grasp Synthesis Algorithms: A Survey , 1996, Int. J. Robotics Res..

[10]  Anis Sahbani,et al.  An overview of 3D object grasp synthesis algorithms , 2012, Robotics Auton. Syst..

[11]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[12]  Gerd Hirzinger,et al.  Optimal Hand-Eye Calibration , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Patric Jensfelt,et al.  Object Detection Approach for Robot Grasp Detection , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jianbin Tang,et al.  GraspNet: An Efficient Convolutional Neural Network for Real-time Grasp Detection for Low-powered Devices , 2018, IJCAI.

[16]  Mirko Wächter,et al.  Grasping of Unknown Objects Using Deep Convolutional Neural Networks Based on Depth Images , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[18]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[19]  Pieter Abbeel,et al.  Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding , 2010, 2010 IEEE International Conference on Robotics and Automation.

[20]  Ken Goldberg,et al.  On-Policy Dataset Synthesis for Learning Robot Grasping Policies Using Fully Convolutional Deep Networks , 2019, IEEE Robotics and Automation Letters.

[21]  Ian Taylor,et al.  Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Jianbin Tang,et al.  EnsembleNet: Improving Grasp Detection using an Ensemble of Convolutional Neural Networks , 2018, BMVC.

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[25]  Vijay Kumar,et al.  Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[26]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Ales Leonardis,et al.  One-shot learning and generation of dexterous grasps for novel objects , 2016, Int. J. Robotics Res..

[28]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Hong Liu,et al.  Robot grasp detection using multimodal deep convolutional neural networks , 2016 .

[30]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).