SiamDA: Dual attention Siamese network for real-time visual tracking

Abstract Fully convolutional Siamese network (SiamFC) has demonstrated high performance in the visual tracking field, but the learned CNN features are redundant and not discriminative to separate the object from the background. To address the above problem, this paper proposes a dual attention module that is integrated into the Siamese network to select the features both in the spatial and channel domains. Especially, a non-local attention module is followed by the last layer of the network, and this benefit to obtain the self-attention feature map of the target from the spatial dimension. On the other hand, a channel attention module is proposed to adjust the importance of different channels’ features according to the corresponding responses generated by each channel feature and the target. Additionally, the GOT10k data set is employed to train our dual attention Siamese network (SiamDA) to improve the target representation ability, which enhances the discrimination of the model. Experimental results show that the proposed algorithm improves the accuracy by 7.6% and the success rate by 5.6% compared with the baseline tracker.

[1]  Jiri Matas,et al.  Discriminative Correlation Filter Tracker with Channel and Spatial Reliability , 2016, International Journal of Computer Vision.

[2]  Wei Feng,et al.  Selective Spatial Regularization by Reinforcement Learned Decision Making for Object Tracking , 2019, IEEE Transactions on Image Processing.

[3]  Wei Feng,et al.  Fast Learning of Spatially Regularized and Content Aware Correlation Filter for Visual Tracking , 2020, IEEE Transactions on Image Processing.

[4]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[5]  Qiang Wang,et al.  Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking , 2018, IJCAI.

[6]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yuan Dong,et al.  Correlation Filters with Weighted Convolution Responses , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[9]  Michael Felsberg,et al.  Unveiling the Power of Deep Tracking , 2018, ECCV.

[10]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[15]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Zhipeng Zhang,et al.  Deeper and Wider Siamese Networks for Real-Time Visual Tracking , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Qing Guo,et al.  Content-Related Spatial Regularization for Visual Object Tracking , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[20]  Michael Felsberg,et al.  Coloring Channel Representations for Visual Tracking , 2015, SCIA.

[21]  Qing Guo,et al.  Dynamic Saliency-Aware Regularization for Correlation Filter-Based Object Tracking , 2019, IEEE Transactions on Image Processing.

[22]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[25]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[27]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[28]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[30]  Huchuan Lu,et al.  Structured Siamese Network for Real-Time Visual Tracking , 2018, ECCV.

[31]  Xin Zhao,et al.  GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael Felsberg,et al.  The Sixth Visual Object Tracking VOT2018 Challenge Results , 2018, ECCV Workshops.

[34]  Jiri Matas,et al.  A Novel Performance Evaluation Methodology for Single-Target Trackers , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[36]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.