i-Siam: Improving Siamese Tracker with Distractors Suppression and Long-Term Strategies

Recently, Siamese Network (SiamFC) has attracted much attention due to its fast tracking capability. Despite many improvements introduced, its accuracy is still far from human performance. In this work, we show that there are several design flaws in SiamFC tracker. In particular, the negative signals produced by SiamFC lead to noisy response map. In addition, background noises prevent SiamFC from extracting clean features from the template. To suppress these distractions, first we propose a negative signal suppression approach such that irrelevant features are deactivated. Secondly, we demonstrate that image-level suppression is also important in maximizing the tracking accuracy in addition to the existing feature-level suppression. With better detection sensitivity, we further propose a Diverse Multi-Template approach for appearance adaptation while reducing the risk of template drifting during long-term tracking. In our experiments, we conduct extensive ablation studies to demonstrate the effectiveness of the proposed components. Our tracker named improved Siamese (i-Siam) Tracker is able to achieve state-of-the-art results on UAV123, OTB-100, OxUvA, and TLP datasets compared to the existing trackers. Nonetheless, our tracker runs in real time, which is around 43 FPS.

[1]  Arnold W. M. Smeulders,et al.  Long-term Tracking in the Wild: A Benchmark , 2018, ECCV.

[2]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[7]  Bernard Ghanem,et al.  A Benchmark and Simulator for UAV Tracking , 2016, ECCV.

[8]  Antoni B. Chan,et al.  Learning Dynamic Memory Networks for Object Tracking , 2018, ECCV.

[9]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Tae Hee Han,et al.  Fast Normalized Cross-Correlation , 2009, Circuits Syst. Signal Process..

[11]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Qinghua Hu,et al.  Vision Meets Drones: A Challenge , 2018, ArXiv.

[13]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[15]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wenjun Zeng,et al.  Learning to Update for Object Tracking , 2018, ArXiv.

[21]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[22]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[27]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Chong Luo,et al.  Towards a Better Match in Siamese Network Based Visual Object Tracker , 2018, ECCV Workshops.

[30]  Vineet Gandhi,et al.  Long-Term Visual Object Tracking Benchmark , 2017, ACCV.

[31]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..