HFF6D: Hierarchical Feature Fusion Network for Robust 6D Object Pose Tracking

Tracking the 6-degree-of-freedom (6D) object pose in video sequences is gaining attention because it has a wide application in multimedia and robotic manipulation. However, current methods often perform poorly in challenging scenes, such as incorrect initial pose, sudden re-orientation, and severe occlusion. In contrast, we present a robust 6D object pose tracking method with a novel hierarchical feature fusion network, refer it as HFF6D, which aims to predict the object’s relative pose between adjacent frames. Instead of extracting features from adjacent frames separately, HFF6D establishes sufficient spatial-temporal information interaction between adjacent frames. In addition, we propose a novel subtraction feature fusion (SFF) module with attention mechanism to leverage feature subtraction during feature fusion. It explicitly highlights the feature differences between adjacent frames, thus improving the robustness of relative pose estimation in challenging scenes. Besides, we leverage data augmentation technology to make HFF6D be used more effectively in the real world by training only with synthetic data, thereby reducing manual effort in data annotation. We evaluate HFF6D on the well-known YCB-Video and YCBInEOAT datasets. Quantitative and qualitative results demonstrate that HFF6D outperforms state-of-the-art (SOTA) methods in both accuracy and efficiency. Moreover, it is also proved to achieve high-robustness tracking under the above-mentioned challenging scenes.

[1]  Qifeng Yu,et al.  Robust Monocular Pose Tracking of Less-Distinct Objects Based on Contour-Part Model , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Kostas E. Bekris,et al.  BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Giuseppe Loianno,et al.  VIPose: Real-time Visual-Inertial 6D Object Pose Tracking , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Henglin Shi,et al.  iMiGUE: An Identity-free Video Dataset for Micro-Gesture Understanding and Emotion Analysis , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kostas E. Bekris,et al.  Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains , 2021, ArXiv.

[6]  Jianwei Guo,et al.  Efficient Center Voting for Object Detection and 6D Pose Estimation in 3D Point Cloud , 2021, IEEE Transactions on Image Processing.

[7]  Jiguang Yue,et al.  Accurate 6DOF Pose Tracking for Texture-Less Objects , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Rio Yokota,et al.  RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Haoqiang Fan,et al.  FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  A. Yuille,et al.  NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation , 2021, ICLR.

[11]  P. Maragos,et al.  How to track your dragon: A Multi-Attentional Framework for real-time RGB-D 6-DOF Object Pose Tracking , 2020, ECCV Workshops.

[12]  Josef Kittler,et al.  Complementary Discriminative Correlation Filters Based on Collaborative Representation for Visual Object Tracking , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Sven Behnke,et al.  Refining 6D Object Pose Predictions using Abstract Render-and-Compare , 2019, 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids).

[14]  Timothy Bretl,et al.  PoseRBPF: A Rao–Blackwellized Particle Filter for 6-D Object Pose Tracking , 2019, IEEE Transactions on Robotics.

[15]  Silvio Savarese,et al.  DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hujun Bao,et al.  PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Andreas Wieser,et al.  The Perfect Match: 3D Point Cloud Matching With Smoothed Densities , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Dieter Fox,et al.  Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[19]  Ming Lu,et al.  A Direct 3D Object Tracking Method Based on Dynamic Textured Model Rendering and Extended Dense Feature Fields , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[20]  Stanley T. Birchfield,et al.  Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[21]  Vincent Lepetit,et al.  Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation , 2018, ECCV.

[22]  Yi Li,et al.  DeepIM: Deep Iterative Matching for 6D Pose Estimation , 2018, International Journal of Computer Vision.

[23]  Gregory D. Hager,et al.  A Unified Framework for Multi-View Multi-Class Object Pose Estimation , 2018, ECCV.

[24]  Ian D. Reid,et al.  Deep-6DPose: Recovering 6D Object Pose from a Single RGB Image , 2018, ArXiv.

[25]  Vincent Lepetit,et al.  Feature Mapping for Learning Fast and Accurate 3D Pose Inference from Synthetic Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Dieter Fox,et al.  PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes , 2017, Robotics: Science and Systems.

[27]  Ulrich Schwanecke,et al.  Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Vincent Lepetit,et al.  BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  P. Abbeel,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Kostas E. Bekris,et al.  A self-supervised learning system for object detection using physics simulation and multi-view pose estimation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Eduardo Ros,et al.  Real-Time Pose Detection and Tracking of Hundreds of Objects , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Ulrich Schwanecke,et al.  Real-Time Monocular Segmentation and Pose Tracking of Multiple Objects , 2016, ECCV.

[35]  Matthias Nießner,et al.  3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Stefan Schaal,et al.  Depth-based object tracking using a Robust Gaussian Filter , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[38]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[39]  Stefan Schaal,et al.  Probabilistic object tracking using a range camera , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Henrik I. Christensen,et al.  3D textureless object detection and tracking: An edge-based approach , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[42]  Ian D. Reid,et al.  PWP3D: Real-Time Segmentation and Tracking of 3D Objects , 2012, International Journal of Computer Vision.

[43]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[44]  Nassir Navab,et al.  Model globally, match locally: Efficient and robust 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Henrik I. Christensen,et al.  Real-time 3D model-based tracking using edge and keypoint features for robotic manipulation , 2010, 2010 IEEE International Conference on Robotics and Automation.

[46]  T. Tuytelaars,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[47]  Vincent Lepetit,et al.  Combining edge and texture information for real-time accurate 3D camera tracking , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[48]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[49]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[50]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[51]  Chen Qijun,et al.  A Novel Depth and Color Feature Fusion Framework for 6D Object Pose Estimation , 2021, IEEE Transactions on Multimedia.

[52]  Chris Harris,et al.  RAPID - a video rate object tracker , 1990, BMVC.