Learning Reinforced Attentional Representation for End-to-End Visual Tracking

Despite the fact that tremendous advances have been made by numerous recent tracking approaches in the last decade, how to achieve high-performance visual tracking is still an open problem. In this paper, we propose an end-to-end network model to learn reinforced attentional representation for accurate target object discrimination and localization. We utilize a novel hierarchical attentional module with long short-term memory and multi-layer perceptrons to leverage both inter- and intra-frame attention to effectively facilitate visual pattern emphasis. Moreover, we incorporate a contextual attentional correlation filter into the backbone network to make our model be trained in an end-to-end fashion. Our proposed approach not only takes full advantage of informative geometries and semantics, but also updates correlation filters online without the backbone network fine-tuning to enable adaptation of target appearance variations. Extensive experiments conducted on several popular benchmark datasets demonstrate the effectiveness and efficiency of our proposed approach while remaining computational efficiency.

[1]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[2]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[3]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[5]  Huchuan Lu,et al.  Multi attention module for visual tracking , 2019, Pattern Recognit..

[6]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[8]  Chun Yuan,et al.  Learning attentional recurrent neural network for visual tracking , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Ming-Hsuan Yang,et al.  Learning Spatial-Aware Regressions for Visual Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[11]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yiannis Demiris,et al.  Context-Aware Deep Feature Compression for High-Speed Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[15]  Yiannis Demiris,et al.  Attentional Correlation Filter Network for Adaptive Visual Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17]  Qiang Wang,et al.  Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking , 2018, IJCAI.

[18]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Huchuan Lu,et al.  Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[21]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22]  Wei Wu,et al.  End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Wei Liu,et al.  Neural Compatibility Modeling with Attentive Knowledge Distillation , 2018, SIGIR.

[24]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[26]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Fei Wang,et al.  Siamese Attentional Keypoint Network for High Performance Visual Tracking , 2019, Knowl. Based Syst..

[28]  Peng Gao,et al.  Learning Cascaded Siamese Networks for High Performance Visual Tracking , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[29]  Fei Wang,et al.  High Performance Visual Tracking with Circular and Structural Operators , 2018, Knowl. Based Syst..

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32]  Michael Felsberg,et al.  The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[39]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Michael Felsberg,et al.  Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[43]  Fei Wang,et al.  Adaptive Object Tracking with Complementary Models , 2018, IEICE Trans. Inf. Syst..

[44]  Ling Shao,et al.  Recent advances and trends in visual tracking: A review , 2011, Neurocomputing.

[45]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[46]  Jianbing Shen,et al.  Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[47]  Tao Zhuo,et al.  Fast Video Object Segmentation via Mask Transfer Network , 2019, ArXiv.

[48]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[50]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[53]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54]  Mohan S. Kankanhalli,et al.  A^3NCF: An Adaptive Aspect Attention Model for Rating Prediction , 2018, IJCAI.

[55]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Antoni B. Chan,et al.  Recurrent Filter Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[57]  Fei Wang,et al.  Efficient Multi-level Correlating for Visual Tracking , 2018, ACCV.

[58]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[60]  Linmei Hu,et al.  Virtually Trying on New Clothing with Arbitrary Poses , 2019, ACM Multimedia.

[61]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[62]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).