6IMPOSE: Bridging the Reality Gap in 6D Pose Estimation for Robotic Grasping

6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four modules: First, a data generation pipeline that employs the 3D software suite Blender to create synthetic RGBD image datasets with 6D pose annotations. Second, an annotated RGBD dataset of five household objects generated using the proposed pipeline. Third, a real-time two-stage 6D pose estimation approach that integrates the object detector YOLO-V4 and a streamlined, real-time version of the 6D pose estimation algorithm PVN3D optimized for time-sensitive robotics applications. Fourth, a codebase designed to facilitate the integration of the vision system into a robotic grasping experiment. Our approach demonstrates the efficient generation of large amounts of photo-realistic RGBD images and the successful transfer of the trained inference model to robotic grasping experiments, achieving an overall success rate of 87% in grasping five different household objects from cluttered backgrounds under varying lighting conditions. This is made possible by the fine-tuning of data generation and domain randomization techniques, and the optimization of the inference pipeline, overcoming the generalization and performance shortcomings of the original PVN3D algorithm. Finally, we make the code, synthetic dataset, and all the pretrained models available on Github.

[1]  Yidan Tao,et al.  E2EK: End-to-End Regression Network Based on Keypoint for 6D Pose Estimation , 2022, IEEE Robotics and Automation Letters.

[2]  Hang Zhong,et al.  A Practical Robotic Grasping Method by Using 6-D Pose Estimation With Protective Correction , 2022, IEEE Transactions on Industrial Electronics.

[3]  Marco F. Huber,et al.  Automatic Grasp Pose Generation for Parallel Jaw Grippers , 2021, IAS.

[4]  Haoqiang Fan,et al.  FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dieter Fox,et al.  ACRONYM: A Large-Scale Grasp Dataset Based on Simulation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[6]  A. Buch,et al.  Bridging the Reality Gap for Pose Estimation Networks using Sensor-Based Domain Randomization , 2020, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[7]  Chien-Yao Wang,et al.  Scaled-YOLOv4: Scaling Cross Stage Partial Network , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Marco F. Huber,et al.  A Survey on Learning-Based Robotic Grasping , 2020, Current Robotics Reports.

[9]  Arash Ajoudani,et al.  Towards an Intelligent Collaborative Robotic System for Mixed Case Palletizing , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Wei Chen,et al.  G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hong Qiao,et al.  A Survey of Methods and Strategies for High-Precision Robotic Grasping and Assembly Tasks—Some New Trends , 2019, IEEE/ASME Transactions on Mechatronics.

[12]  Jian Sun,et al.  PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  D. Fox,et al.  Self-supervised 6D Object Pose Estimation for Robot Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Timothy Patten,et al.  SyDPose: Object Detection and Pose Estimation in Cluttered Real-World Depth Images Trained using Only Synthetic Data , 2019, 2019 International Conference on 3D Vision (3DV).

[15]  Slobodan Ilic,et al.  HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[16]  Slobodan Ilic,et al.  DPOD: 6D Pose Object Detector and Refiner , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yi Zhou,et al.  On the Continuity of Rotation Representations in Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Hui Cheng,et al.  Weakly supervised 6D pose estimation for robotic grasping , 2018, VRCAI.

[21]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[22]  Slobodan Ilic,et al.  Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only , 2018, 2018 International Conference on 3D Vision (3DV).

[23]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[24]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[25]  Danfei Xu,et al.  PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[27]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[29]  P. Abbeel,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[32]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[33]  Rin-ichiro Taniguchi,et al.  Estimating Surface Normals with Depth Image Gradients for Fast and Accurate Registration , 2015, 2015 International Conference on 3D Vision.

[34]  Jitendra Malik,et al.  Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[35]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[36]  Partha Pratim Das,et al.  Characterizations of Noise in Kinect Depth Images: A Review , 2014, IEEE Sensors Journal.

[37]  Nassir Navab,et al.  Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[39]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[41]  Timothy Patten,et al.  SyDD : Synthetic Depth Data Randomization for Object Detection using Domain-Relevant Background , 2019 .

[42]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .