论文信息 - Single Shot 6D Object Pose Estimation

Single Shot 6D Object Pose Estimation

In this paper, we introduce a novel single shot approach for 6D object pose estimation of rigid objects based on depth images. For this purpose, a fully convolutional neural network is employed, where the 3D input data is spatially discretized and pose estimation is considered as a regression task that is solved locally on the resulting volume elements. With 65 fps on a GPU, our Object Pose Network (OP-Net) is extremely fast, is optimized end-to-end, and estimates the 6D pose of multiple objects in the image simultaneously. Our approach does not require manually 6D pose-annotated real-world datasets and transfers to the real world, although being entirely trained on synthetic data. The proposed method is evaluated on public benchmark datasets, where we can demonstrate that state-of-the-art methods are significantly outperformed.

Marco F. Huber | M. Huber | Kilian Kleeberger

[1] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2] Cordelia Schmid,et al. 3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[3] Nassir Navab,et al. Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Vincent Lepetit,et al. Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[5] Vincent Lepetit,et al. Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] Eric Brachmann,et al. Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[8] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[9] Roberto Cipolla,et al. Robust Instance Recognition in Presence of Occlusion and Clutter , 2014, ECCV.

[10] Tae-Kyun Kim,et al. Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Jitendra Malik,et al. Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Leonidas J. Guibas,et al. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14] Jitendra Malik,et al. Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Joseph Redmon,et al. Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17] Tae-Kyun Kim,et al. Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Manolis I. A. Lourakis,et al. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20] Nassir Navab,et al. SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21] Dumitru Erhan,et al. Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] René Vidal,et al. 3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] James L. Crowley,et al. Defining the Pose of Any 3D Rigid Object and an Associated Distance , 2016, International Journal of Computer Vision.

[28] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[30] James L. Crowley,et al. Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[31] Vincent Lepetit,et al. BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Tae-Kyun Kim,et al. Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios , 2018, BMVC.

[33] Dieter Fox,et al. PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[34] Zoltan-Csaba Marton,et al. Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[35] Eric Brachmann,et al. BOP: Benchmark for 6D Object Pose Estimation , 2018, ECCV.

[36] Pascal Fua,et al. Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Bolei Zhou,et al. Real-Time Object Pose Estimation with Pose Interpreter Networks , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38] Ian D. Reid,et al. Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image , 2018, ArXiv.

[39] Sven Behnke,et al. Robust 6D Object Pose Estimation in Cluttered Scenes Using Semantic Segmentation and Pose Regression Networks , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40] Marco F. Huber,et al. Large-scale 6D Object Pose Estimation Dataset for Industrial Bin-Picking , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41] Hui Cheng,et al. PPR-Net:Point-wise Pose Regression Network for Instance Segmentation and 6D Pose Estimation in Bin-picking Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42] Hujun Bao,et al. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Timothy Bretl,et al. PoseRBPF: A Rao-Blackwellized Particle Filter for6D Object Pose Estimation , 2019, Robotics: Science and Systems.

[44] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.