Learning Hierarchical Features for Visual Object Tracking With Recursive Neural Networks

Recently, deep learning has achieved very promising results in visual object tracking. Deep neural networks in existing tracking methods require a lot of training data to learn a large number of parameters. However, training data is not sufficient for visual object tracking as annotations of a target object are only available in the first frame of a test sequence. In this paper, we propose to learn hierarchical features for visual object tracking by using tree structure based Recursive Neural Networks (RNN), which have a relatively small number of parameters compared to other deep neural networks (e.g. Convolutional Neural Networks (CNN)) due to all basic modules in RNN share only one set of parameters. Experimental results demonstrate that our feature learning algorithm can significantly improve tracking performance on benchmark datasets.

[1]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[3]  Andrew Y. Ng,et al.  Convolutional-Recursive Deep Learning for 3D Object Classification , 2012, NIPS.

[4]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[6]  Shaogang Gong,et al.  Tracking colour objects using adaptive mixture models , 1999, Image Vis. Comput..

[7]  Shai Avidan,et al.  Support Vector Tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[10]  Gang Wang,et al.  Learning deep features for multiple object tracking by using a multi-task learning strategy , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[11]  Chunhua Shen,et al.  Real-time visual tracking using compressive sensing , 2011, CVPR 2011.

[12]  Gérard G. Medioni,et al.  Context tracker: Exploring supporters and distracters in unconstrained environments , 2011, CVPR 2011.

[13]  Hanqing Lu,et al.  A robust boosting tracker with minimum error bound in a co-training framework , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Haibin Ling,et al.  Real time robust L1 tracker using accelerated proximal gradient approach , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Junliang Xing,et al.  Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[19]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[22]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[23]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[24]  Gang Hua,et al.  Discriminative Tracking by Metric Learning , 2010, ECCV.

[25]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Qi Zhao,et al.  Co-Tracking Using Semi-Supervised Support Vector Machines , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Xiaoqin Zhang,et al.  Graph Based Discriminative Learning for Robust and Efficient Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Gang Wang,et al.  Video tracking using learned hierarchical features. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[30]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[31]  Nan Jiang,et al.  Order determination and sparsity-regularized metric learning adaptive visual tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ming Tang,et al.  Robust tracking via weakly supervised ranking SVM , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[34]  Feng Li,et al.  Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Horst Bischof,et al.  On-line semi-supervised multiple-instance boosting , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[38]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[39]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Huchuan Lu,et al.  Robust Object Tracking via Sparse Collaborative Appearance Model , 2014, IEEE Transactions on Image Processing.

[41]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[42]  Haibin Ling,et al.  Robust Visual Tracking using 1 Minimization , 2009 .

[43]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Junseok Kwon,et al.  Tracking by Sampling Trackers , 2011, 2011 International Conference on Computer Vision.

[46]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking , 2014, BMVC.