Single Shot 6D Object Pose Estimation

In this paper, we introduce a novel single shot approach for 6D object pose estimation of rigid objects based on depth images. For this purpose, a fully convolutional neural network is employed, where the 3D input data is spatially discretized and pose estimation is considered as a regression task that is solved locally on the resulting volume elements. With 65 fps on a GPU, our Object Pose Network (OP-Net) is extremely fast, is optimized end-to-end, and estimates the 6D pose of multiple objects in the image simultaneously. Our approach does not require manually 6D pose-annotated real-world datasets and transfers to the real world, although being entirely trained on synthetic data. The proposed method is evaluated on public benchmark datasets, where we can demonstrate that state-of-the-art methods are significantly outperformed.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[3]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[5]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Roberto Cipolla,et al.  Robust Instance Recognition in Presence of Occlusion and Clutter , 2014, ECCV.

[10]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Joseph Redmon,et al.  Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[17]  Tae-Kyun Kim,et al.  Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  René Vidal,et al.  3D Pose Regression Using Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  James L. Crowley,et al.  Defining the Pose of Any 3D Rigid Object and an Associated Distance , 2016, International Journal of Computer Vision.

[28]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[30]  James L. Crowley,et al.  Symmetry Aware Evaluation of 3D Object Detection and Pose Estimation in Scenes of Many Parts in Bulk , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[31]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Tae-Kyun Kim,et al.  Multi-Task Deep Networks for Depth-Based 6D Object Pose and Joint Registration in Crowd Scenarios , 2018, BMVC.

[33]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[34]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[35]  Eric Brachmann,et al.  BOP: Benchmark for 6D Object Pose Estimation , 2018, ECCV.

[36]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Bolei Zhou,et al.  Real-Time Object Pose Estimation with Pose Interpreter Networks , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38]  Ian D. Reid,et al.  Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image , 2018, ArXiv.

[39]  Sven Behnke,et al.  Robust 6D Object Pose Estimation in Cluttered Scenes Using Semantic Segmentation and Pose Regression Networks , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Marco F. Huber,et al.  Large-scale 6D Object Pose Estimation Dataset for Industrial Bin-Picking , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Hui Cheng,et al.  PPR-Net:Point-wise Pose Regression Network for Instance Segmentation and 6D Pose Estimation in Bin-picking Scenarios , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[42]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Timothy Bretl,et al.  PoseRBPF: A Rao-Blackwellized Particle Filter for6D Object Pose Estimation , 2019, Robotics: Science and Systems.

[44]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.