Siamese Keypoint Prediction Network for Visual Object Tracking

Visual object tracking aims to estimate the location of an arbitrary target in a video sequence given its initial bounding box. By utilizing offline feature learning, the siamese paradigm has recently been the leading framework for high performance tracking. However, current existing siamese trackers either heavily rely on complicated anchor-based detection networks or lack the ability to resist to distractors. In this paper, we propose the Siamese keypoint prediction network (SiamKPN) to address these challenges. Upon a Siamese backbone for feature embedding, SiamKPN benefits from a cascade heatmap strategy for coarse-to-fine prediction modeling. In particular, the strategy is implemented by sequentially shrinking the coverage of the label heatmap along the cascade to apply loose-to-strict intermediate supervisions. During inference, we find the predicted heatmaps of successive stages to be gradually concentrated to the target and reduced to the distractors. SiamKPN performs well against state-of-the-art trackers for visual object tracking on four benchmark datasets including OTB-100, VOT2018, LaSOT and GOT-10k, while running at real-time speed.

[1]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[5]  Rynson W. H. Lau,et al.  CREST: Convolutional Residual Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[10]  Luc Van Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Zhiwei Xiong,et al.  SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[15]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking , 2015, IEEE Transactions on Image Processing.

[17]  Haibin Ling,et al.  Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[19]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[20]  Huchuan Lu,et al.  Correlation Tracking via Joint Discrimination and Reliability Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yiannis Demiris,et al.  Context-Aware Deep Feature Compression for High-Speed Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[27]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[28]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Bingbing Ni,et al.  Deep Regression Tracking with Shrinkage Loss , 2018, ECCV.

[30]  Huchuan Lu,et al.  Structured Siamese Network for Real-Time Visual Tracking , 2018, ECCV.

[31]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[32]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[38]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Josef Kittler,et al.  Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking , 2018, IEEE Transactions on Image Processing.

[40]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[41]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[42]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[45]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[47]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[48]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[51]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Haibin Ling,et al.  Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).