论文信息 - Learning Reinforced Attentional Representation for End-to-End Visual Tracking

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

Despite the fact that tremendous advances have been made by numerous recent tracking approaches in the last decade, how to achieve high-performance visual tracking is still an open problem. In this paper, we propose an end-to-end network model to learn reinforced attentional representation for accurate target object discrimination and localization. We utilize a novel hierarchical attentional module with long short-term memory and multi-layer perceptrons to leverage both inter- and intra-frame attention to effectively facilitate visual pattern emphasis. Moreover, we incorporate a contextual attentional correlation filter into the backbone network to make our model be trained in an end-to-end fashion. Our proposed approach not only takes full advantage of informative geometries and semantics, but also updates correlation filters online without the backbone network fine-tuning to enable adaptation of target appearance variations. Extensive experiments conducted on several popular benchmark datasets demonstrate the effectiveness and efficiency of our proposed approach while remaining computational efficiency.

[1] Arnold W. M. Smeulders,et al. UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[2] Wei Wu,et al. Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[3] Ming-Hsuan Yang,et al. Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Cordelia Schmid,et al. Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[5] Huchuan Lu,et al. Multi attention module for visual tracking , 2019, Pattern Recognit..

[6] Simone Calderara,et al. Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[8] Chun Yuan,et al. Learning attentional recurrent neural network for visual tracking , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[9] Ming-Hsuan Yang,et al. Learning Spatial-Aware Regressions for Visual Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Michael Felsberg,et al. Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[11] Ming-Hsuan Yang,et al. Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Yiannis Demiris,et al. Context-Aware Deep Feature Compression for High-Speed Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14] Rui Caseiro,et al. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[15] Yiannis Demiris,et al. Attentional Correlation Filter Network for Adaptive Visual Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[17] Qiang Wang,et al. Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking , 2018, IJCAI.

[18] Yi Wu,et al. Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Huchuan Lu,et al. Deep visual tracking: Review and experimental comparison , 2018, Pattern Recognit..

[21] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22] Wei Wu,et al. End-to-End Flow Correlation Tracking with Spatial-Temporal Attention , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Wei Liu,et al. Neural Compatibility Modeling with Attentive Knowledge Distillation , 2018, SIGIR.

[24] King-Sun Fu,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Zhenyu He,et al. The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[26] Wei Wu,et al. High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27] Fei Wang,et al. Siamese Attentional Keypoint Network for High Performance Visual Tracking , 2019, Knowl. Based Syst..

[28] Peng Gao,et al. Learning Cascaded Siamese Networks for High Performance Visual Tracking , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[29] Fei Wang,et al. High Performance Visual Tracking with Circular and Structural Operators , 2018, Knowl. Based Syst..

[30] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32] Michael Felsberg,et al. The Visual Object Tracking VOT2017 Challenge Results , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[33] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34] Qingming Huang,et al. Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[36] Chong Luo,et al. A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37] Bernard Ghanem,et al. Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[39] Luca Bertinetto,et al. End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Michael Felsberg,et al. Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[43] Fei Wang,et al. Adaptive Object Tracking with Complementary Models , 2018, IEICE Trans. Inf. Syst..

[44] Ling Shao,et al. Recent advances and trends in visual tracking: A review , 2011, Neurocomputing.

[45] Jing Peng,et al. SVM vs regularized least squares classification , 2004, ICPR 2004.

[46] Jianbing Shen,et al. Triplet Loss in Siamese Network for Object Tracking , 2018, ECCV.

[47] Tao Zhuo,et al. Fast Video Object Segmentation via Mask Transfer Network , 2019, ArXiv.

[48] Bruce A. Draper,et al. Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49] Jiri Matas,et al. Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[50] Rui Caseiro,et al. High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[53] Junliang Xing,et al. Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[54] Mohan S. Kankanhalli,et al. A^3NCF: An Adaptive Aspect Attention Model for Rating Prediction , 2018, IJCAI.

[55] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Antoni B. Chan,et al. Recurrent Filter Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[57] Fei Wang,et al. Efficient Multi-level Correlating for Visual Tracking , 2018, ACCV.

[58] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[60] Linmei Hu,et al. Virtually Trying on New Clothing with Arbitrary Poses , 2019, ACM Multimedia.

[61] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[62] Michael Felsberg,et al. ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).