论文信息 - Learning 6-DOF Grasping Interaction via Deep Geometry-Aware 3D Representations

Learning 6-DOF Grasping Interaction via Deep Geometry-Aware 3D Representations

This paper focuses on the problem of learning 6- DOF grasping with a parallel jaw gripper in simulation. Our key idea is constraining and regularizing grasping interaction learning through 3D geometry prediction. We introduce a deep geometry-aware grasping network (DGGN) that decomposes the learning into two steps. First, we learn to build mental geometry-aware representation by reconstructing the scene (i.e., 3D occupancy grid) from RGBD input via generative 3D shape modeling. Second, we learn to predict grasping outcome with its internal geometry-aware representation. The learned outcome prediction model is used to sequentially propose grasping solutions via analysis-by-synthesis optimization. Our contributions are fourfold: (1) To best of our knowledge, we are presenting for the first time a method to learn a 6-DOF grasping net from RGBD input; (2) We build a grasping dataset from demonstrations in virtual reality with rich sensory and interaction annotations. This dataset includes 101 everyday objects spread across 7 categories, additionally, we propose a data augmentation strategy for effective learning; (3) We demonstrate that the learned geometry-aware representation leads to about 10% relative performance improvement over the baseline CNN on grasping objects from our dataset. (4) We further demonstrate that the model generalizes to novel viewpoints and object instances.

[1] Eren Erdal Aksoy,et al. Part-based grasp planning for familiar objects , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[2] Matei T. Ciocarlie,et al. The Columbia grasp database , 2009, 2009 IEEE International Conference on Robotics and Automation.

[3] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[4] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[5] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.

[6] Dirk P. Kroese,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[7] Scott E. Reed,et al. Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[8] Patrick Pérez,et al. MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Ashutosh Saxena,et al. Robotic Grasping of Novel Objects using Vision , 2008, Int. J. Robotics Res..

[10] Danica Kragic,et al. Dexterous grasping under shape uncertainty , 2016, Robotics Auton. Syst..

[11] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[12] Max Jaderberg,et al. Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[13] Stefan Leutenegger,et al. Deep learning a grasp function for grasping under gripper pose uncertainty , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Danica Kragic,et al. Learning grasping points with shape context , 2010, Robotics Auton. Syst..

[16] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17] Chad DeChant,et al. Shape completion enabled robotic grasping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Ales Leonardis,et al. One-shot learning and generation of dexterous grasps for novel objects , 2016, Int. J. Robotics Res..

[20] Mathieu Aubry,et al. Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21] J. Andrew Bagnell,et al. Perceiving, learning, and exploiting object affordances for autonomous pile manipulation , 2013, Auton. Robots.

[22] Stefan Ulbrich,et al. OpenGRASP: A Toolkit for Robot Grasping Simulation , 2010, SIMPAR.

[23] Ville Kyrki,et al. Category-based task specific grasping , 2015, Robotics Auton. Syst..

[24] Alexei A. Efros,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[26] Manuel Lopes,et al. Active learning of visual descriptors for grasping using non-parametric smoothed beta distributions , 2012, Robotics Auton. Syst..

[27] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[28] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[30] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[31] Honglak Lee,et al. Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[32] Ersin Yumer,et al. Self-supervised Learning of Motion Capture , 2017, NIPS.

[33] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[34] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jan Peters,et al. Experiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies , 2016, ISER.

[36] Kate Saenko,et al. High precision grasp pose detection in dense clutter , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37] Peter K. Allen,et al. Semantic grasping: planning task-specific stable robotic grasps , 2014, Auton. Robots.

[38] Subhransu Maji,et al. 3D Shape Induction from 2D Views of Multiple Objects , 2016, 2017 International Conference on 3D Vision (3DV).

[39] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.

[40] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[41] Abhinav Gupta,et al. The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.