SiamMan: Siamese Motion-aware Network for Visual Tracking

In this paper, we present a novel siamese motion-aware network (SiamMan) for visual tracking, which consists of the siamese feature extraction subnetwork, followed by the classification, regression, and localization branches in parallel. The classification branch is used to distinguish the foreground from background, and the regression branch is adopt to regress the bounding box of target. To reduce the impact of manually designed anchor boxes to adapt to different target motion patterns, we design the localization branch, which aims to coarsely localize the target to help the regression branch to generate accurate results. Meanwhile, we introduce the global context module into the localization branch to capture long-range dependency for more robustness in large displacement of target. In addition, we design a multi-scale learnable attention module to guide these three branches to exploit discriminative features for better performance. The whole network is trained offline in an end-to-end fashion with large-scale image pairs using the standard SGD algorithm with back-propagation. Extensive experiments on five challenging benchmarks, i.e., VOT2016, VOT2018, OTB100, UAV123 and LTB35, demonstrate that SiamMan achieves leading accuracy with high efficiency. Code can be found at this https URL.

[1]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[2]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[6]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[7]  Michael Felsberg,et al.  ATOM: Accurate Tracking by Overlap Maximization , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  John K. Tsotsos,et al.  Fast Visual Object Tracking with Rotated Bounding Boxes , 2019, ArXiv.

[9]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[11]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[12]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[13]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[14]  Rynson W. H. Lau,et al.  VITAL: VIsual Tracking via Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[16]  Huchuan Lu,et al.  Visual Tracking via Adaptive Spatially-Regularized Correlation Filters , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Mohamed H. Abdelpakey,et al.  DomainSiam: Domain-Aware Siamese Network for Visual Object Tracking , 2019, ISVC.

[18]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[21]  Luc Van Gool,et al.  Learning Discriminative Model Prediction for Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Huchuan Lu,et al.  Correlation Tracking via Joint Discrimination and Reliability Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[26]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ning Sun,et al.  Fully Conventional Anchor-Free Siamese Networks for Object Tracking , 2019, IEEE Access.

[28]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[29]  Haibin Ling,et al.  Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Josef Kittler,et al.  Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking , 2018, IEEE Transactions on Image Processing.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[36]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[38]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Jiri Matas,et al.  Now you see me: evaluating performance in long-term visual tracking , 2018, ArXiv.