Seeing Through the Occluders: Robust Monocular 6-DOF Object Pose Tracking via Model-Guided Video Object Segmentation

To deal with occlusion is one of the most challenging problems for monocular 6-DOF object pose tracking. In this letter, we propose a novel 6-DOF object pose tracking method which is robust to heavy occlusions. When the tracked object is occluded by another object, instead of trying to detect the occluder, we seek to see through it, as if the occluder doesnt exist. To this end, we propose to combine a learning-based video object segmentation module with an optimization-based pose estimation module in a closed loop. Firstly, a model-guided video object segmentation network is utilized to predict the accurate and full mask of the object (including the occluded part). Secondly, a non-linear 6-DOF pose optimization method is performed with the guidance of the predicted full mask. After solving the current object pose, we render the 3D object model to obtain a refined, model-constrained mask of the current frame, which is then fed back to the segmentation network for processing the next frame, closing the whole loop. Experiments show that the proposed method outperforms the state-of-arts by a large margin for dealing with heavy occlusions, and could handle extreme cases which previous methods would fail.

[1]  FuaPascal,et al.  Monocular model-based 3D tracking of rigid objects , 2005 .

[2]  Nassir Navab,et al.  Looking Beyond the Simple Scenarios: Combining Learners and Optimizers in 3D Temporal Tracking , 2017, IEEE Transactions on Visualization and Computer Graphics.

[3]  Jitendra Malik,et al.  Amodal Instance Segmentation , 2016, ECCV.

[4]  Lin Chen,et al.  Illumination insensitive efficient second-order minimization for planar object tracking , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Éric Marchand,et al.  A robust model-based tracker combining geometrical and color edge information , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Kai Chen,et al.  Video Object Segmentation with Re-identification , 2017, ArXiv.

[7]  Li Zhang,et al.  A Robust Monocular 3D Object Tracking Method Combining Statistical and Photometric Constraints , 2018, International Journal of Computer Vision.

[8]  Henrik I. Christensen,et al.  Real-time 3D model-based tracking using edge and keypoint features for robotic manipulation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[10]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[11]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Chunhong Pan,et al.  3D object tracking via boundary constrained region-based model , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[13]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Xiaoyong Shen,et al.  Amodal Instance Segmentation With KINS Dataset , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Guoquan Huang,et al.  Tightly-Coupled Visual-Inertial Localization and 3-D Rigid-Body Target Tracking , 2019, IEEE Robotics and Automation Letters.

[16]  Eric Brachmann,et al.  6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[17]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[18]  Ming Lu,et al.  A Direct 3D Object Tracking Method Based on Dynamic Textured Model Rendering and Extended Dense Feature Fields , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Daniel Cremers,et al.  A Region-Based Gauss-Newton Approach to Real-Time Monocular Multiple Object Tracking , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yuandong Tian,et al.  Semantic Amodal Segmentation , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jonathan D. Gammell,et al.  The Oxford Multimotion Dataset: Multiple SE(3) Motions With Ground Truth , 2019, IEEE Robotics and Automation Letters.

[22]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[23]  K.-K. Maninis,et al.  Video Object Segmentation without Temporal Information , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Vincent Lepetit,et al.  Multiple 3D Object tracking for augmented reality , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[25]  Javier Díaz,et al.  Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Bin Wang,et al.  Pose optimization in edge distance field for textureless 3D object tracking , 2017, CGI.

[27]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ian D. Reid,et al.  PWP3D: Real-time Segmentation and Tracking of 3D Objects , 2009, BMVC.

[29]  Ulrich Schwanecke,et al.  Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects , 2016, ECCV.

[30]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michel Dhome,et al.  Generic edgelet-based tracking of 3D objects in real-time , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Rami R. Hagege,et al.  2D-3D Pose Estimation of Heterogeneous Objects Using a Region Based Approach , 2015, International Journal of Computer Vision.

[33]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[35]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Nassir Navab,et al.  Deep Model-Based 6D Pose Refinement in RGB , 2018, ECCV.

[37]  Judith Kelner,et al.  Model Based Markerless 3D Tracking applied to Augmented Reality , 2010 .

[38]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.