Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

To reduce data collection time for deep learning of robust robotic grasp plans, we explore training from a synthetic dataset of 6.7 million point clouds, grasps, and analytic grasp metrics generated from thousands of 3D models from Dex-Net 1.0 in randomized poses on a table. We use the resulting dataset, Dex-Net 2.0, to train a Grasp Quality Convolutional Neural Network (GQ-CNN) model that rapidly predicts the probability of success of grasps from depth images, where grasps are specified as the planar position, angle, and depth of a gripper relative to an RGB-D sensor. Experiments with over 1,000 trials on an ABB YuMi comparing grasp planning methods on singulated objects suggest that a GQ-CNN trained with only synthetic data from Dex-Net 2.0 can be used to plan grasps in 0.8sec with a success rate of 93% on eight known objects with adversarial geometry and is 3x faster than registering point clouds to a precomputed dataset of objects and indexing grasps. The Dex-Net 2.0 grasp planner also has the highest success rate on a dataset of 10 novel rigid objects and achieves 99% precision (one false positive out of 69 grasps classified as robust) on a dataset of 40 novel household objects, some of which are articulated or deformable. Code, datasets, videos, and supplementary material are available at http://berkeleyautomation.github.io/dex-net .

[1]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[2]  Ling Xu,et al.  Physical Human Interactive Guidance: Identifying Grasping Principles From Human-Planned Grasps , 2012, IEEE Transactions on Robotics.

[3]  Nikolaos G. Tsagarakis,et al.  Detecting object affordances with Convolutional Neural Networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Pieter Abbeel,et al.  Multimodal blending for high-accuracy instance recognition , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Peter K. Allen,et al.  Data-driven grasping , 2011, Auton. Robots.

[6]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[7]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Yu Sun,et al.  Grasp planning to maximize task coverage , 2015, Int. J. Robotics Res..

[9]  John F. Canny,et al.  Part pose statistics: estimators and experiments , 1999, IEEE Trans. Robotics Autom..

[10]  Medhat A. Moussa,et al.  Modeling Grasp Motor Imagery Through Deep Conditional Generative Models , 2017, IEEE Robotics and Automation Letters.

[11]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Alexander Herzog,et al.  Learning of grasp selection based on shape-templates , 2014, Auton. Robots.

[13]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Guoyuan Liang,et al.  A vision-based robotic grasping system using deep learning for 3D object recognition and pose estimation , 2013, 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[15]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[16]  Stefan Leutenegger,et al.  Deep learning a grasp function for grasping under gripper pose uncertainty , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Rüdiger Dillmann,et al.  The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics , 2012, Int. J. Robotics Res..

[19]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20]  Robert Platt,et al.  Using Geometry to Detect Grasp Poses in 3D Point Clouds , 2015, ISRR.

[21]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[22]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[23]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  Ashutosh Saxena,et al.  Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[25]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Markus Vincze,et al.  3DNet: Large-scale object class recognition from CAD models , 2012, 2012 IEEE International Conference on Robotics and Automation.

[27]  Kenneth Y. Goldberg,et al.  Cloud-based robot grasping with the google object recognition engine , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28]  Chad DeChant,et al.  Shape completion enabled robotic grasping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[30]  Danica Kragic,et al.  Grasp Moduli Spaces , 2013, Robotics: Science and Systems.

[31]  Mathieu Aubry,et al.  Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[32]  James Davidson,et al.  Supervision via competition: Robot adversaries for learning tasks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Kate Saenko,et al.  High precision grasp pose detection in dense clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Markus Vincze,et al.  Shape-based depth image to 3D model matching and classification with inter-view similarity , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Stefanie Tellex,et al.  Autonomously Acquiring Instance-Based Object Models from Experience , 2015, ISRR.

[38]  Danica Kragic,et al.  Learning a dictionary of prototypical grasp-predicting parts from grasping experience , 2013, 2013 IEEE International Conference on Robotics and Automation.

[39]  Kenneth Y. Goldberg,et al.  Design of parallel-jaw gripper tip surfaces for robust grasping , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Matei T. Ciocarlie,et al.  Collaborative grasp planning with multiple object representations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[41]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[44]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[45]  Joel W. Burdick,et al.  Finding antipodal point grasps on irregularly shaped objects , 1992, IEEE Trans. Robotics Autom..

[46]  Danica Kragic,et al.  Multi-armed bandit models for 2D grasp planning with uncertainty , 2015, 2015 IEEE International Conference on Automation Science and Engineering (CASE).

[47]  Matei T. Ciocarlie,et al.  Towards Reliable Grasping and Manipulation in Household Environments , 2010, ISER.

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Reuven Y. Rubinstein,et al.  Fast Sequential Monte Carlo Methods for Counting and Optimization , 2013 .

[50]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Martijn Wisse,et al.  Team Delft's Robot Winner of the Amazon Picking Challenge 2016 , 2016, RoboCup.

[52]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[53]  Kenneth Y. Goldberg,et al.  Privacy-preserving Grasp Planning in the Cloud , 2016, 2016 IEEE International Conference on Automation Science and Engineering (CASE).

[54]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[55]  Partha Pratim Das,et al.  Characterizations of Noise in Kinect Depth Images: A Review , 2014, IEEE Sensors Journal.

[56]  Anca D. Dragan,et al.  Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[57]  Sergey Levine,et al.  Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[58]  Stefan Ulbrich,et al.  OpenGRASP: A Toolkit for Robot Grasping Simulation , 2010, SIMPAR.

[59]  Manuel Lopes,et al.  Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[60]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[61]  Alberto Rodriguez,et al.  From caging to grasping , 2011, Int. J. Robotics Res..

[62]  Danica Kragic,et al.  Learning grasping points with shape context , 2010, Robotics Auton. Syst..

[63]  Francesc Moreno-Noguer,et al.  Learning Depth-Aware Deep Representations for Robotic Perception , 2017, IEEE Robotics and Automation Letters.

[64]  Peter K. Allen,et al.  Generating multi-fingered robotic grasps via deep learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[65]  Danica Kragic,et al.  Classical grasp quality evaluation: New algorithms and theory , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[66]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[67]  Dieter Fox,et al.  Interactive singulation of objects from a pile , 2012, 2012 IEEE International Conference on Robotics and Automation.

[68]  Kuan-Ting Yu,et al.  Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[69]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[70]  Padhraic Smyth,et al.  On loss functions which minimize to conditional expected values and posterior proba- bilities , 1993, IEEE Trans. Inf. Theory.

[71]  Danica Kragic,et al.  Large-scale supervised learning of the grasp robustness of surface patch pairs , 2016, 2016 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).