Deep learning a grasp function for grasping under gripper pose uncertainty

This paper presents a new method for parallel-jaw grasping of isolated objects from depth images, under large gripper pose uncertainty. Whilst most approaches aim to predict the single best grasp pose from an image, our method first predicts a score for every possible grasp pose, which we denote the grasp function. With this, it is possible to achieve grasping robust to the gripper's pose uncertainty, by smoothing the grasp function with the pose uncertainty function. Therefore, if the single best pose is adjacent to a region of poor grasp quality, that pose will no longer be chosen, and instead a pose will be chosen which is surrounded by a region of high grasp quality. To learn this function, we train a Convolutional Neural Network which takes as input a single depth image of an object, and outputs a score for each grasp pose across the image. Training data for this is generated by use of physics simulation and depth image simulation with 3D object meshes, to enable acquisition of sufficient data without requiring exhaustive real-world experiments. We evaluate with both synthetic and real experiments, and show that the learned grasp score is more robust to gripper pose uncertainty than when this uncertainty is not accounted for.

[1]  Ulf Assarsson,et al.  A Benchmark for , 2001 .

[2]  Peter K. Allen,et al.  Graspit! A versatile simulator for robotic grasping , 2004, IEEE Robotics & Automation Magazine.

[3]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[4]  Ashutosh Saxena,et al.  Efficient grasping from RGBD images: Learning using a new rectangle representation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  James J. Kuffner,et al.  Physically-based grasp quality evaluation under uncertainty , 2012, 2012 IEEE International Conference on Robotics and Automation.

[6]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Guang-Zhong Yang,et al.  Generative Methods for Long-Term Place Recognition in Dynamic Scenes , 2013, International Journal of Computer Vision.

[9]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Peter K. Allen,et al.  Generating multi-fingered robotic grasps via deep learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[12]  Abdeslam Boularias,et al.  Learning to Manipulate Unknown Objects in Clutter by Reinforcement , 2015, AAAI.

[13]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  Máximo A. Roa,et al.  Functional power grasps transferred through warping and replanning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Yuval Tassa,et al.  Simulation tools for model-based robotics: Comparison of Bullet, Havok, MuJoCo, ODE and PhysX , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kate Saenko,et al.  High precision grasp pose detection in dense clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Ales Leonardis,et al.  One-shot learning and generation of dexterous grasps for novel objects , 2016, Int. J. Robotics Res..

[26]  Tae-Kyun Kim,et al.  Siamese Regression Networks with Efficient mid-level Feature Extraction for 3D Object Pose Estimation , 2016, ArXiv.

[27]  Krystian Mikolajczyk,et al.  PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors , 2016, ArXiv.

[28]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..