Triangulate geometric constraint combined with visual-flow fusion network for accurate 6DoF pose estimation

Abstract Estimating the 6D object pose based on a monocular RGB image is a challenging task in computer vision, which produces false positives under the influence of occlusion or cluttered environments. In addition, the prediction of translation is affected by changes of the image size. In this work, we present a novel two-stage method TGCPose6D for robust 6DoF object pose estimation which is composed of 2D keypoint detection and translation refinement. In the first stage, the 2D keypoint regression space is constrained by triangulate geometric feature vectors, and the low-quality prediction is suppressed by the center-heatmap weighted loss function, thereby the performance of keypoint detection is significantly improved. In the second stage, the Visual-Flow Fusion network (VFFNet) is used to extract the visual feature and optical flow feature of the rendered image and the observed image, and to predict the relative translation based on the difference of features. Specifically, the VFFNet is trained iteratively to gain the ability to predict the relative translation deviation. Extensive experiments are conducted to demonstrate the effectiveness of the proposed TGCPose6D method. Our overall pose estimation pipeline outperforms state-of-the-art object pose estimation methods on several benchmarks.

[1]  Pascal Fua,et al.  Segmentation-Driven 6D Object Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Vincent Lepetit,et al.  Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[4]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[7]  Xiaowei Zhou,et al.  6-DoF object pose from semantic keypoints , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Weijia Zhou,et al.  DeepHMap++: Combined Projection Grouping and Correspondence Learning for Full DoF Pose Estimation , 2019, Sensors.

[9]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Nassir Navab,et al.  When Regression Meets Manifold Learning for Object Recognition and Pose Estimation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Zoltan-Csaba Marton,et al.  Implicit 3D Orientation Learning for 6D Object Detection from RGB Images , 2018, ECCV.

[13]  Kostas Daniilidis,et al.  Single image 3D object detection and pose estimation for grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[15]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Vincent Lepetit,et al.  Monocular Model-Based 3D Tracking of Rigid Objects: A Survey , 2005, Found. Trends Comput. Graph. Vis..

[17]  Siddhartha S. Srinivasa,et al.  The YCB object and Model set: Towards common benchmarks for manipulation research , 2015, 2015 International Conference on Advanced Robotics (ICAR).

[18]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[19]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Lars Petersson,et al.  CullNet: Calibrated and Pose Aware Confidence Scores for Object Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[21]  Pascal Fua,et al.  Real-Time Seamless Single Shot 6D Object Pose Prediction , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Jianping Fan,et al.  6D object pose estimation via viewpoint relation reasoning , 2020, Neurocomputing.

[23]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[24]  Vincent Lepetit,et al.  Gradient Response Maps for Real-Time Detection of Textureless Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[26]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[28]  Zhiguo Jiang,et al.  Real-time 6D pose estimation from a single RGB image , 2019, Image Vis. Comput..

[29]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[32]  Nassir Navab,et al.  Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[33]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Zhiguo Jiang,et al.  Out-of-region keypoint localization for 6D pose estimation , 2020, Image Vis. Comput..