SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking

The greatest challenge facing visual object tracking is the simultaneous requirements on robustness and discrimination power. In this paper, we propose a SiamFC-based tracker, named SPM-Tracker, to tackle this challenge. The basic idea is to address the two requirements in two separate matching stages. Robustness is strengthened in the coarse matching (CM) stage through generalized training while discrimination power is enhanced in the fine matching (FM) stage through a distance learning network. The two stages are connected in series as the input proposals of the FM stage are generated by the CM stage. They are also connected in parallel as the matching scores and box location refinements are fused to generate the final results. This innovative series-parallel structure takes advantage of both stages and results in superior performance. The proposed SPM-Tracker, running at 120fps on GPU, achieves an AUC of 0.687 on OTB-100 and an EAO of 0.434 on VOT-16, exceeding other real-time trackers by a notable margin.

[1]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[2]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[3]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[5]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[8]  Ming Tang,et al.  High-Speed Tracking with Multi-kernel Correlation Filters , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yong Liu,et al.  Large Margin Object Tracking with Circulant Feature Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Chong Luo,et al.  Towards a Better Match in Siamese Network Based Visual Object Tracker , 2018, ECCV Workshops.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[16]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[18]  Bohyung Han,et al.  Real-Time MDNet , 2018, ECCV.

[19]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Huchuan Lu,et al.  Learning regression and verification networks for long-term visual tracking , 2018, ArXiv.

[23]  Rynson W. H. Lau,et al.  CREST: Convolutional Residual Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shiguang Shan,et al.  Joint Representation and Truncated Inference Learning for Correlation Filter based Tracking , 2018, ECCV.

[26]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Jiwen Lu,et al.  Deep Reinforcement Learning with Iterative Shift for Visual Tracking , 2018, ECCV.

[28]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Nanning Zheng,et al.  Description-Discrimination Collaborative Tracking , 2014, ECCV.

[30]  Bernt Schiele,et al.  CityPersons: A Diverse Dataset for Pedestrian Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Antoni B. Chan,et al.  Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[34]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Ming-Hsuan Yang,et al.  Learning Spatial-Aware Regressions for Visual Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[39]  Huchuan Lu,et al.  Real-Time 'Actor-Critic' Tracking , 2018, ECCV.

[40]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Jianke Zhu,et al.  A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration , 2014, ECCV Workshops.

[43]  Qi Tian,et al.  Multi-cue Correlation Filters for Robust Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Huchuan Lu,et al.  Structured Siamese Network for Real-Time Visual Tracking , 2018, ECCV.

[47]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[48]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[49]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Haibin Ling,et al.  Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51]  Bingbing Ni,et al.  Deep Regression Tracking with Shrinkage Loss , 2018, ECCV.

[52]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[53]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[54]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[59]  Bernard Ghanem,et al.  TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild , 2018, ECCV.

[60]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).