Learning Target Candidate Association to Keep Track of What Not to Track

The presence of objects that are confusingly similar to the tracked target, poses a fundamental challenge in appearance-based visual tracking. Such distractor objects are easily misclassified as the target itself, leading to eventual tracking failure. While most methods strive to suppress distractors through more powerful appearance models, we take an alternative approach.We propose to keep track of distractor objects in order to continue tracking the target. To this end, we introduce a learned association network, allowing us to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision. We conduct comprehensive experimental validation and analysis of our approach on several challenging datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.1% on LaSOT [21] and a +5.8% absolute gain on the OxUvA long-term dataset [41]. The code and trained models are available at https://github.com/visionml/pytracking

[1]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[2]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Yiming Li,et al.  AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Philip H. S. Torr,et al.  Siam R-CNN: Visual Tracking by Re-Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Huchuan Lu,et al.  ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-Term Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[8]  Philip H. S. Torr,et al.  The Eighth Visual Object Tracking VOT2020 Challenge Results , 2020, ECCV Workshops.

[9]  Ming Tang,et al.  Learning Feature Embeddings for Discriminant Model Based Tracking , 2019, ECCV.

[10]  Junseok Kwon,et al.  Visual Tracking by TridentAlign and Context Embedding , 2020, ACCV.

[11]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Description and Detection of Local Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shengping Zhang,et al.  Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[14]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[15]  Zhipeng Zhang,et al.  Ocean: Object-aware Anchor-free Tracking , 2020, ECCV.

[16]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Bin Yan,et al.  Alpha-Refine: Boosting Tracking Performance by Precise Bounding Box Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  L. Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Ying Cui,et al.  Graph Attention Tracking , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Gérard G. Medioni,et al.  Multiple Target Tracking Using Spatio-Temporal Markov Chain Monte Carlo Data Association , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Huchuan Lu,et al.  Learning regression and verification networks for long-term visual tracking , 2018, ArXiv.

[26]  Stefan Roth,et al.  MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking , 2020, International Journal of Computer Vision.

[27]  Tianzhu Zhang,et al.  Graph Convolutional Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xin Zhao,et al.  GlobalTrack: A Simple and Strong Baseline for Long-term Tracking , 2019, AAAI.

[29]  Laura Leal-Taixé,et al.  Tracking Without Bells and Whistles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Zhi Zhang,et al.  TLPG-Tracker: Joint Learning of Target Localization and Proposal Generation for Visual Tracking , 2020, IJCAI.

[31]  Weilin Huang,et al.  Deformable Siamese Attention Networks for Visual Object Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[33]  SuperGlue: Learning Feature Matching With Graph Neural Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Luc Van Gool,et al.  Probabilistic Regression for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Ramakant Nevatia,et al.  Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ling Shao,et al.  CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers , 2020, ECCV.

[38]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[40]  L. Leal-Taix'e,et al.  Learning a Neural Solver for Multiple Object Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Bernt Schiele,et al.  Multiple People Tracking by Lifted Multicut and Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ying Cui,et al.  SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wengang Zhou,et al.  Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[45]  Zhenyu He,et al.  The Seventh Visual Object Tracking VOT2019 Challenge Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[46]  Huchuan Lu,et al.  Transformer Tracking , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Qingjie Liu,et al.  STMTrack: Template-free Visual Tracking with Space-time Memory Networks , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Rongrong Ji,et al.  Siamese Box Adaptive Network for Visual Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yuan Liu,et al.  Object Tracking Using Spatio-Temporal Networks for Future Prediction Location , 2020, ECCV.

[50]  Yaonong Wang,et al.  PG-Net: Pixel to Global Matching Network for Visual Tracking , 2020, ECCV.

[51]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Michael Felsberg,et al.  Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Dong Wang,et al.  High-Performance Long-Term Tracking With Meta-Updater , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Zhiwei Xiong,et al.  Tracking by Instance Detection: A Meta-Learning Approach , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Radu Timofte,et al.  How to Train Your Energy-Based Model for Regression , 2020, BMVC.

[59]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[60]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Ales Leonardis,et al.  Distractor-Supported Single Target Tracking in Extremely Cluttered Scenes , 2016, ECCV.

[62]  Gang Yu,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2019, AAAI.

[63]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  L. Gool,et al.  Know Your Surroundings: Exploiting Scene Information for Object Tracking , 2020, ECCV.

[66]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Pengfei Xu,et al.  ROAM: Recurrently Optimizing Tracking Model , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Lin Yuan,et al.  LaSOT: A High-quality Large-scale Single Object Tracking Benchmark , 2020, International Journal of Computer Vision.

[70]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Arnold W. M. Smeulders,et al.  Long-term Tracking in the Wild: A Benchmark , 2018, ECCV.