Homography Decomposition Networks for Planar Object Tracking

Planar object tracking plays an important role in AI applications, such as robotics, visual servoing, and visual SLAM. Although the previous planar trackers work well in most scenarios, it is still a challenging task due to the rapid motion and large transformation between two consecutive frames. The essential reason behind this problem is that the condition number of such a non-linear system changes unstably when the searching range of the homography parameter space becomes larger. To this end, we propose a novel Homography Decomposition Networks (HDN) approach that drastically reduces and stabilizes the condition number by decomposing the homography transformation into two groups. Specifically, a similarity transformation estimator is designed to predict the first group robustly by a deep convolution equivariant network. By taking advantage of the scale and rotation estimation with high confidence, a residual transformation is estimated by a simple regression model. Furthermore, the proposed end-to-end network is trained in a semi-supervised fashion. Extensive experiments show that our proposed approach outperforms the state-of-the-art planar tracking methods at a large margin on the challenging POT, UCSB and POIC datasets. Codes and models are available at https://github.com/zhanxinrui/HDN.

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Edward Y. Chang,et al.  CLKN: Cascaded Lucas-Kanade Networks for Image Alignment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[5]  Vincent Lepetit,et al.  Fast Keypoint Recognition in Ten Lines of Code , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Huchuan Lu,et al.  Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[8]  Tobias Höllerer,et al.  Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking , 2011, International Journal of Computer Vision.

[9]  Tomasz Malisiewicz,et al.  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Vijay Kumar,et al.  Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model , 2017, IEEE Robotics and Automation Letters.

[11]  Andrea Vedaldi,et al.  Warped Convolutions: Efficient Invariance to Spatial Transformations , 2016, ICML.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Max Welling,et al.  Transformation Properties of Learned Visual Representations , 2014, ICLR.

[14]  Efstratios Gavves,et al.  Rotation Equivariant Siamese Networks for Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Selim Benhimane,et al.  Real-time image-based tracking of planes using efficient second-order minimization , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[16]  Dinesh Atchuthan,et al.  A micro Lie theory for state estimation in robotics , 2018, ArXiv.

[17]  Marc Pollefeys,et al.  Online Invariance Selection for Local Feature Descriptors , 2020, ECCV.

[18]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[19]  Andrea Vedaldi,et al.  Understanding Image Representations by Measuring Their Equivariance and Equivalence , 2014, International Journal of Computer Vision.

[20]  Yifan Wu,et al.  Planar Object Tracking in the Wild: A Benchmark , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Jue Wang,et al.  Content-Aware Unsupervised Deep Homography Estimation , 2020, ECCV.

[22]  Haibin Ling,et al.  Gracker: A Graph-Based Planar Object Tracker , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[24]  Ying Cui,et al.  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Rongrong Ji,et al.  Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ping Wang,et al.  Robust visual tracking for planar objects using gradient orientation pyramid , 2019, J. Electronic Imaging.

[27]  Hujun Bao,et al.  GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs , 2019, NeurIPS.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiaowei Zhou,et al.  Polar Transformer Networks , 2017, ICLR.

[30]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[31]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[32]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[33]  Zhipeng Zhang,et al.  Ocean: Object-aware Anchor-free Tracking , 2020, ECCV.

[34]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[35]  Chi-Yi Tsai,et al.  Planar Tracking based on Deep Learning , 2019, 2019 8th International Conference on Innovation, Communication and Engineering (ICICE).

[36]  Haibin Ling,et al.  Planar object tracking benchmark in the wild , 2021, Neurocomputing.

[37]  Frank Chongwoo Park,et al.  A Geometric Particle Filter for Template-Based Visual Tracking , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[39]  Yang Li,et al.  YES NO Cartesian Update Update Feature Extraction Feature Extraction Phase Correlation Resample Min Eq . 3 ? Fourier spaceLog-Polar Cross Correlation Model Fourier space Model Sample Sample , 2018 .