6D Pose Estimation of 3D Objects in Scenes with Mutual Similarities and Occlusions

Estimation of six degrees of freedom (6DoF) attitude of rigid bodies is an essential issue in such fields as robotics and virtual reality. This paper proposed a method that could accurately estimate 6DoF attitude of known rigid bodies with RGB and RGB-D input image data. Additionally, the method could also work well in the scenes in which objects exhibit mutual similarities and occlusion. As one of the contributions made by this paper, a modularized assembly line method was proposed, which integrated deep learning and multi-view geometry method. At first, a neural network for instance segmentation was used to identify the general locations of known objects in the images and give the bounding boxes and masks. Then 6DoF attitude was estimated roughly according to the local features of RGB-D images and templates. Finally, purely geometric method was used to refine the estimation. Another contribution of this paper was the correction of misclassification with the help of some information reserved in the process of training the network. The proposed method achieves a superior performance on a challenging public dataset.

[1]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[2]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Manolis I. A. Lourakis,et al.  T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-Less Objects , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Tinne Tuytelaars,et al.  Discriminatively Trained Templates for 3D Object Detection: A Real Time Scalable Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Vincent Lepetit,et al.  Going Further with Point Pair Features , 2016, ECCV.

[11]  Tae-Kyun Kim,et al.  Latent-Class Hough Forests for 3D Object Detection and Pose Estimation , 2014, ECCV.

[12]  Vincent Lepetit,et al.  Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes , 2011, 2011 International Conference on Computer Vision.

[13]  Eric Brachmann,et al.  Global Hypothesis Generation for 6D Object Pose Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Nassir Navab,et al.  Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation , 2016, ECCV.