Reinforced Similarity Learning: Siamese Relation Networks for Robust Object Tracking

Recently, Siamese networks based tracking algorithms have shown favorable performance. Latest work focuses on better feature embedding and target state estimation, which greatly improves the accuracy. Nevertheless, the simple cross-correlation operation of the features between a fixed template and the search region limits their robustness and discrimination capability. In this paper, we pay more attention to learn an outstanding similarity measure for robust tracking. We propose a novel relation network that can be integrated on top of previous trackers without any need for further training of the siamese networks, which achieves a superior discriminative ability. During online inference, we utilize the feedback from high-confidence tracking results to obtain an additional template and update it, which improves the robustness and generalization. We implement two versions of the proposed approach with the SiamFC-based tracker and SiamRPN-based tracker to validate the strong compatibility of our algorithm. Extensive experimental results on several tracking benchmarks indicate that the proposed method can effectively improve the performance and robustness of the underlying trackers without reducing speed too much, and performs superiorly against the state-of-the-art trackers.

[1]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Ming-Hsuan Yang,et al.  Learning Spatial-Aware Regressions for Visual Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Peng Lu,et al.  Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[5]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Peng Gao,et al.  Learning Cascaded Siamese Networks for High Performance Visual Tracking , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[7]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jungong Han,et al.  ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Huchuan Lu,et al.  Learning regression and verification networks for long-term visual tracking , 2018, ArXiv.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Chong Luo,et al.  Towards a Better Match in Siamese Network Based Visual Object Tracker , 2018, ECCV Workshops.

[13]  Yichen Wei,et al.  Learning Region Features for Object Detection , 2018, ECCV.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Luc Van Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Zhiwei Xiong,et al.  SPM-Tracker: Series-Parallel Matching for Real-Time Visual Object Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[18]  Dawei Zhang,et al.  Joint Representation Learning with Deep Quadruplet Network for Real-Time Visual Tracking , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[19]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Zhonglong Zheng,et al.  High Performance Visual Tracking With Siamese Actor-Critic Network , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[21]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[26]  Philip H.S. Torr,et al.  Siam R-CNN: Visual Tracking by Re-Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[28]  Fan Yang,et al.  LaSOT: A High-Quality Benchmark for Large-Scale Single Object Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Haibin Ling,et al.  Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yue Cao,et al.  Spatial-Temporal Relation Networks for Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Junseok Kwon,et al.  Deep Meta Learning for Real-Time Target-Aware Visual Tracking , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Huchuan Lu,et al.  GradNet: Gradient-Guided Network for Visual Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Liyuan Chen,et al.  Learning Fine-Grained Similarity Matching Networks for Visual Tracking , 2020, ICMR.

[37]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[38]  Fahad Shahbaz Khan,et al.  Learning the Model Update for Siamese Trackers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Stephen Lin,et al.  Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Haibin Ling,et al.  Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[44]  Bingbing Ni,et al.  Deep Regression Tracking with Shrinkage Loss , 2018, ECCV.

[45]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[48]  Cordelia Schmid,et al.  Actor-Centric Relation Network , 2018, ECCV.

[49]  Zuoxin Li,et al.  SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines , 2020, AAAI.

[50]  Yong Liu,et al.  Large Margin Object Tracking with Circulant Feature Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Huchuan Lu,et al.  Visual Tracking via Adaptive Spatially-Regularized Correlation Filters , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[54]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.