Two-Way Complementary Tracking Guidance

Recently, most impressive Siamese network-based trackers are equipped with two independent branches: tracked object classification and bounding box regression. However, there is no tracking information exchange between them during the tracking optimization process. This may lead to the task-mismatch and accuracy inconsistency between both classification and regression branches during inference. To tackle the problems, we propose a novel Mutual Guidance (MG) strategy for visual object tracking, which constructs the bidirectional and complementary tracking information interaction to maintain the tracked object is well-classified to also be well-localized, between classification and regression branches. Specifically, the classification branch can guide the regression one to pay more attention to the sample with high classified scores, by re-weighting the regression loss with the classification confidence. Similarity, the regression branch also guides the classifier optimization process to focus on samples with larger IoU values. And then, the proposed Mutual Guidance is completed by a series of regularization designs on classification score and regression IoU, which dynamically re-assign the adaptive weights to the losses for each sample during the joint tracking optimization. The developed MG is generic and easy to be plugged into various tracking frameworks such as anchor-based, anchor-free based and transformer based, and boost their performance to some extent with negligible additional cost. In addition, we also develop an adaptive localization(L) branch selection scheme to further assist trackers, which determines proper localization branch for different trackers according to the difference in the way of discriminating positive and negative samples. Extensive experiments verify the effectiveness of MGL and its superiority against the state-of-the-art tracking modules on OTB100, GOT-10K, LaSOT, TrackingNet, UAV123, VOT2018 and VOT2019.

[1]  Wanli Ouyang,et al.  SiamSampler: Video-Guided Sampling for Siamese Visual Tracking , 2023, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Xiaobai Liu,et al.  Siamese-Based Twin Attention Network for Visual Tracking , 2023, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Zhiyong Li,et al.  Learning Channel-Aware Correlation Filters for Robust Object Tracking , 2022, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Shengping Zhang,et al.  SiamBAN: Target-Aware Tracking With Siamese Box Adaptive Network , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Feng Tang,et al.  Ranking-Based Siamese Visual Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  S. Shan,et al.  Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework , 2022, ECCV.

[7]  Limin Wang,et al.  MixFormer: End-to-End Tracking with Iterative Mixed Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  L. Gool,et al.  Transforming Model Prediction for Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yue Cao,et al.  Correlation-Aware Deep Tracking , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Haibin Ling,et al.  SwinTrack: A Simple and Strong Baseline for Transformer Tracking , 2021, NeurIPS.

[11]  Qiang Ling,et al.  Learning to Rank Proposals for Siamese Visual Tracking , 2021, IEEE Transactions on Image Processing.

[12]  Ziteng Gao,et al.  Mutual Supervision for Dense Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Weiyao Lin,et al.  SiamRCR: Reciprocal Classification and Regression for Visual Object Tracking , 2021, IJCAI.

[14]  Zhenyu He,et al.  SiamCorners: Siamese Corner Networks for Visual Tracking , 2021, IEEE Transactions on Multimedia.

[15]  Zhenjun Tang,et al.  Learning to Filter: Siamese Relation Network for Robust Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jianlong Fu,et al.  Learning Spatio-Temporal Transformer for Visual Tracking , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Luc Van Gool,et al.  Learning Target Candidate Association to Keep Track of What Not to Track , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Huchuan Lu,et al.  Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wengang Zhou,et al.  Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yanjun Wu,et al.  SiamCAN: Real-Time Visual Tracking Based on Siamese Center-Aware Network , 2021, IEEE Transactions on Image Processing.

[21]  Heng Zhang,et al.  Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection , 2020, ACCV.

[22]  Zhipeng Zhang,et al.  Ocean: Object-aware Anchor-free Tracking , 2020, ECCV.

[23]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Shengping Zhang,et al.  Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Peng Zhang,et al.  Deep Position-Sensitive Tracking , 2020, IEEE Transactions on Multimedia.

[26]  Pan Wang,et al.  Adaptive Discriminative Deep Correlation Filter for Visual Object Tracking , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Philip H. S. Torr,et al.  Siam R-CNN: Visual Tracking by Re-Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jiri Matas,et al.  D3S – A Discriminative Single Shot Segmentation Tracker , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ying Cui,et al.  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[31]  Zhenyu He,et al.  The Seventh Visual Object Tracking VOT2019 Challenge Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[32]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Ning Wang,et al.  Reliable Re-Detection for Long-Term Tracking , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Zhiwei Xiong,et al.  SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kaiqi Huang,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[45]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[46]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[47]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[49]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[50]  Antoni B. Chan,et al.  Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[51]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[55]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[56]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[57]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[59]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[62]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[63]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Luc Van Gool,et al.  Tracking the Known and the Unknown by Leveraging Semantic Information , 2019, BMVC.

[65]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.