Object-Adaptive LSTM Network for Visual Tracking

Convolutional Neural Networks (CNNs) have shown outstanding performance in visual object tracking. However, most of classification-based tracking methods using CNNs are time-consuming due to expensive computation of complex online fine-tuning and massive feature extractions. Besides, these methods suffer from the problem of over-fitting since the training and testing stages of CNN models are based on the videos from the same domain. Recently, matching-based tracking methods (such as Siamese networks) have shown remarkable speed superiority, while they cannot well address target appearance variations and complex scenes for inherent lack of online adaptability and background information. In this paper, we propose a novel object-adaptive LSTM network, which can effectively exploit sequence dependencies and dynamically adapt to the temporal object variations via constructing an intrinsic model for object appearance and motion. In addition, we develop an efficient strategy for proposal selection, where the densely sampled proposals are firstly pre-evaluated using the fast matching-based method and then the well-selected high-quality proposals are fed to the sequence-specific learning LSTM network. This strategy enables our method to adaptively track an arbitrary object and operate faster than conventional CNN-based classification tracking methods. To the best of our knowledge, this is the first work to apply an LSTM network for classification in visual object tracking. Experimental results on OTB and TC-128 benchmarks show that the proposed method achieves state-of-the-art performance, which exhibits great potentials of recurrent structures for visual object tracking.

[1]  Zheng Zhang,et al.  First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks , 2015, ArXiv.

[2]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Christopher Joseph Pal,et al.  RATM: Recurrent Attentive Tracking Model , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[5]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[6]  Jianke Zhu,et al.  A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration , 2014, ECCV Workshops.

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Zhe,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[9]  Lei Zhang,et al.  Object Tracking via Dual Linear Structured SVM and Explicit Feature Map , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[12]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Huchuan Lu,et al.  Dual Deep Network for Visual Tracking , 2016, IEEE Transactions on Image Processing.

[14]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[15]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  Qingshan Liu,et al.  Robust Visual Tracking via Convolutional Networks Without Training , 2015, IEEE Transactions on Image Processing.

[19]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Ali Farhadi,et al.  Re3 : Real-Time Recurrent Regression Networks for Object Tracking , 2017, ArXiv.

[24]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[26]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Huchuan Lu,et al.  Robust object tracking via sparsity-based collaborative model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Haibin Ling,et al.  SANet: Structure-Aware Network for Visual Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Erik Blasch,et al.  Encoding color information for visual tracking: Algorithms and benchmark , 2015, IEEE Transactions on Image Processing.