Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards

This paper presents the Dexterity Network (Dex-Net) 1.0, a dataset of 3D object models and a sampling-based planning algorithm to explore how Cloud Robotics can be used for robust grasp planning. The algorithm uses a Multi- Armed Bandit model with correlated rewards to leverage prior grasps and 3D object models in a growing dataset that currently includes over 10,000 unique 3D object models and 2.5 million parallel-jaw grasps. Each grasp includes an estimate of the probability of force closure under uncertainty in object and gripper pose and friction. Dex-Net 1.0 uses Multi-View Convolutional Neural Networks (MV-CNNs), a new deep learning method for 3D object classification, to provide a similarity metric between objects, and the Google Cloud Platform to simultaneously run up to 1,500 virtual cores, reducing experiment runtime by up to three orders of magnitude. Experiments suggest that correlated bandit techniques can use a cloud-based network of object models to significantly reduce the number of samples required for robust grasp planning. We report on system sensitivity to variations in similarity metrics and in uncertainty in pose and friction. Code and updated information is available at http://berkeleyautomation.github.io/dex-net/.

[1]  Kenneth Y. Goldberg,et al.  Computing parallel-jaw grips , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[2]  Katsushi Ikeuchi,et al.  Illumination normalization with time-dependent intrinsic images for video surveillance , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[3]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[4]  Ruzena Bajcsy,et al.  Active Learning for Vision-Based Robot Grasping , 2005, Machine Learning.

[5]  Yu Zheng,et al.  Coping with the Grasping Uncertainties in Force-closure Analysis , 2005, Int. J. Robotics Res..

[6]  Deepayan Chakrabarti,et al.  Multi-armed bandit problems with dependent arms , 2007, ICML '07.

[7]  Matei T. Ciocarlie,et al.  The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[8]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9]  Oliver Kroemer,et al.  Combining active learning and reactive control for robot grasping , 2010, Robotics Auton. Syst..

[10]  Li Zhang Grasp Evaluation With Graspable Feature Matching , 2010 .

[11]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[12]  Matei T. Ciocarlie,et al.  Collaborative grasp planning with multiple object representations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[13]  Oliver Kroemer,et al.  Learning grasp affordance densities , 2011, Paladyn J. Behav. Robotics.

[14]  Peter K. Allen,et al.  Data-driven grasping , 2011, Auton. Robots.

[15]  Jesse Hoey,et al.  Continuous Correlated Beta Processes , 2011, IJCAI.

[16]  Rüdiger Dillmann,et al.  The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics , 2012, Int. J. Robotics Res..

[17]  Dmitry Berenson,et al.  Toward cloud-based grasping with uncertainty in shape: Estimating lower bounds on achieving force closure with zero-slip push grasps , 2012, 2012 IEEE International Conference on Robotics and Automation.

[18]  James J. Kuffner,et al.  Physically-based grasp quality evaluation under uncertainty , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Ling Xu,et al.  Physical Human Interactive Guidance: Identifying Grasping Principles From Human-Planned Grasps , 2012, IEEE Transactions on Robotics.

[20]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[21]  Markus Vincze,et al.  3DNet: Large-scale object class recognition from CAD models , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Manuel Lopes,et al.  Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[24]  Danica Kragic,et al.  Grasp Moduli Spaces , 2013, Robotics: Science and Systems.

[25]  Kenneth Y. Goldberg,et al.  Cloud-based robot grasping with the google object recognition engine , 2013, 2013 IEEE International Conference on Robotics and Automation.

[26]  Matthew W. Hoffman,et al.  Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization , 2013, 1303.6746.

[27]  Danica Kragic,et al.  Learning a dictionary of prototypical grasp-predicting parts from grasping experience , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28]  Danica Kragic,et al.  Classical grasp quality evaluation: New algorithms and theory , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  James J. Kuffner,et al.  Physically Based Grasp Quality Evaluation Under Pose Uncertainty , 2013, IEEE Transactions on Robotics.

[30]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[31]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[32]  Federico Tombari,et al.  SHOT: Unique signatures of histograms for surface and texture description , 2014, Comput. Vis. Image Underst..

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Paul Timothy Furgale,et al.  Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems , 2014, IEEE Transactions on Robotics.

[35]  Pieter Abbeel,et al.  BigBIRD: A large-scale 3D database of object instances , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Alexander Herzog,et al.  Learning of grasp selection based on shape-templates , 2014, Auton. Robots.

[37]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Matei T. Ciocarlie,et al.  GP-GPIS-OPT: Grasp planning with shape uncertainty using Gaussian process implicit surfaces and Sequential Convex Programming , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[40]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Jeannette Bohg,et al.  Leveraging big data for grasp planning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Pieter Abbeel,et al.  Image Object Label 3 D CAD Model Candidate Grasps Google Object Recognition Engine Google Cloud Storage Select Feasible Grasp with Highest Success Probability Pose EstimationCamera Robots Cloud 3 D Sensor , 2014 .

[43]  Florian T. Pokorny,et al.  Multi-Arm Bandit Models for 2 D Sample Based Grasp Planning with Uncertainty , 2015 .

[44]  Stefanie Tellex,et al.  Autonomously Acquiring Instance-Based Object Models from Experience , 2015, ISRR.

[45]  P. Abbeel,et al.  Benchmarking in Manipulation Research: The YCB Object and Model Set and Benchmarking Protocols , 2015, ArXiv.

[46]  Máximo A. Roa,et al.  Functional power grasps transferred through warping and replanning , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Mathieu Aubry,et al.  Understanding Deep Features with Computer-Generated Imagery , 2015, ICCV.

[48]  Bin Fang,et al.  A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries , 2015, Comput. Vis. Image Underst..

[49]  Song Wu,et al.  3 D ShapeNets : A Deep Representation for Volumetric Shape Modeling , 2015 .

[50]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[51]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).