Visual tracking with complementary deep feature optimization

Abstract. The challenges of rotation, occlusion, and scale change in visual object tracking have been an urgent problem to be solved. In recent years, the online learning ability and discriminative power of convolutional neural network (CNN) features have gained great attention in many computer vision tasks. Previous studies have shown that individual layer CNN features can be used for dealing with specific tracking challenges, such as target accurate positioning, rotation, and deformation. However, tracking with single CNN features is not enough to deal with serious challenges, such as the above-mentioned. To handle this problem, we evaluate the contribution of specific CNN layers features in tracking tasks and present a complementary CNN features tracking framework, which treats tracking procedure as a deep CNN features optimizing. Our tracker does not merely fuse the CNN features but is a semantic level method, which adds a competition mechanism to optimize the target appearance encoding of CNN layers. Therefore, the proposed tracker is robust not only to target location but also to rotation and deformation. We also embed hard negative mining to enhance the discriminative power of the tracking model and use bounding box regression to refine the tracking result. Having compared our tracker performance with other state-of-the-art trackers, the obtained experimental results in a large-scale dataset demonstrate that the effectiveness of our proposed tracker outperforms state-of-the-art ones.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Chen Bo,et al.  Vision-based Object Detection and Tracking: A Review , 2016 .

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dit-Yan Yeung,et al.  Understanding and Diagnosing Visual Tracking Systems , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Narendra Ahuja,et al.  Robust Visual Tracking Using Oblique Random Forests , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[10]  Lei Zhang,et al.  Object Tracking via Dual Linear Structured SVM and Explicit Feature Map , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[18]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Nicu Sebe,et al.  Computer Vision – ECCV 2016 , 2016, Lecture Notes in Computer Science.

[20]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[21]  Bingbing Ni,et al.  When Correlation Filters Meet Convolutional Neural Networks for Visual Tracking , 2016, IEEE Signal Processing Letters.

[22]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Michael Felsberg,et al.  Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Le Zhang,et al.  Robust visual tracking via co-trained Kernelized correlation filters , 2017, Pattern Recognit..

[27]  Ales Leonardis,et al.  Visual Object Tracking Performance Measures Revisited , 2015, IEEE Transactions on Image Processing.

[28]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[29]  Jia Deng,et al.  Large scale visual recognition , 2012 .

[30]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[33]  Zhenyu He,et al.  Connected Component Model for Multi-Object Tracking , 2016, IEEE Transactions on Image Processing.

[34]  Takahiro Ishikawa,et al.  The template update problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[39]  Qingshan Liu,et al.  Robust Visual Tracking via Convolutional Networks Without Training , 2015, IEEE Transactions on Image Processing.

[40]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[41]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[42]  Chi-Woo Lee,et al.  Visual object tracking using particle filter , 2012, 2012 9th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI).

[43]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[44]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Yuan Dong,et al.  Correlation Filters with Weighted Convolution Responses , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[46]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.