论文信息 - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (i.e. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem. While end-to-end methods have recently demonstrated promising results at high efficiency, they are still inferior when compared with elaborate PnP/RANSACbased approaches in terms of pose accuracy. In this work, we address this shortcoming by means of a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects which considerably enhances the accuracy of end-to-end 6D pose estimation. Our framework, named SO-Pose, takes a single RGB image as input and respectively generates 2D-3D correspondences as well as self-occlusion information harnessing a shared encoder and two separate decoders. Both outputs are then fused to directly regress the 6DoF pose parameters. Incorporating cross-layer consistencies that align correspondences, selfocclusion and 6D pose, we can further improve accuracy and robustness, surpassing or rivaling all other state-ofthe-art approaches on various challenging datasets.

[1] Eric Brachmann,et al. Learning 6 D Object Pose Estimation using 3 D Object Coordinates-Supplementary Material - , 2014 .

[2] Xiangyang Ji,et al. CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3] Vincent Lepetit,et al. 3D Pose Estimation and 3D Model Retrieval for Objects in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Slobodan Ilic,et al. DPOD: Dense 6D Pose Object Detector in RGB images , 2019, ArXiv.

[5] Zoltan-Csaba Marton,et al. Multi-Path Learning for Object Pose Estimation Across Domains , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Vincent Lepetit,et al. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7] Nassir Navab,et al. Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[9] Federico Tombari,et al. GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11] Vincent Lepetit,et al. Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[14] Yi Li,et al. DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[15] Geoffrey E. Hinton,et al. Lookahead Optimizer: k steps forward, 1 step back , 2019, NeurIPS.

[16] Nassir Navab,et al. Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[17] Jiri Matas,et al. EPOS: Estimating 6D Pose of Objects With Symmetries , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[19] Tamim Asfour,et al. Stereo-based 6D object localization for grasping with humanoid robot systems , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Eric Brachmann,et al. BOP: Benchmark for 6D Object Pose Estimation , 2018, ECCV.

[22] Pascal Fua,et al. Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Timothy Patten,et al. Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[25] Jiaru Song,et al. HybridPose: 6D Object Pose Estimation Under Hybrid Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Nassir Navab,et al. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Р Ю Чуйков,et al. Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[28] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Pascal Fua,et al. Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Charless C. Fowlkes,et al. 3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Joseph Redmon,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[32] Nassir Navab,et al. Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras , 2017, International Journal of Computer Vision.

[33] Pascal Fua,et al. Single-Stage 6D Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Eric Brachmann,et al. Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[35] Lei Zhang,et al. Gradient Centralization: A New Optimization Technique for Deep Neural Networks , 2020, ECCV.

[36] Zoltan-Csaba Marton,et al. Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[37] Mathieu Aubry,et al. CosyPose: Consistent multi-view multi-object 6D pose estimation , 2020, ECCV.

[38] Eric Brachmann,et al. Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Bo Chen,et al. End-to-End Learnable Geometric Vision by Backpropagating PnP Optimization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Leonidas J. Guibas,et al. Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] James M. Rehg,et al. 3D-RCNN: Instance-Level 3D Object Reconstruction via Render-and-Compare , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Adrien Gaidon,et al. ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Hujun Bao,et al. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[47] Andrew W. Fitzgibbon,et al. Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Stepán Obdrzálek,et al. On Evaluation of 6D Object Pose Estimation , 2016, ECCV Workshops.

[49] Nassir Navab,et al. Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.

[50] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).