Recurrent Filter Learning for Visual Tracking

Recently using convolutional neural networks (CNNs) has gained popularity in visual tracking, due to its robust feature representation of images. Recent methods perform online tracking by fine-tuning a pre-trained CNN model to the specific target object using stochastic gradient descent (SGD) back-propagation, which is usually time-consuming. In this paper, we propose a recurrent filter generation methods for visual tracking. We directly feed the target's image patch to a recurrent neural network (RNN) to estimate an object-specific filter for tracking. As the video sequence is a spatiotemporal data, we extend the matrix multiplications of the fully-connected layers of the RNN to a convolution operation on feature maps, which preserves the target's spatial structure and also is memory-efficient. The tracked object in the subsequent frames will be fed into the RNN to adapt the generated filters to appearance variations of the target. Note that once the off-line training process of our network is finished, there is no need to fine-tune the network for specific objects, which makes our approach more efficient than methods that use iterative fine-tuning to online learn the target. Extensive experiments conducted on widely used benchmarks, OTB and VOT, demonstrate encouraging results compared to other recent methods.

[1]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Zhen Cui,et al.  Recurrently Target-Attending Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[5]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[10]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[11]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[12]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jianke Zhu,et al.  A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration , 2014, ECCV Workshops.

[18]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[20]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[28]  Haibin Ling,et al.  SANet: Structure-Aware Network for Visual Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Gao,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[30]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[31]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Abhinav Gupta,et al.  Transferring Rich Feature Hierarchies for Robust Visual Tracking , 2015, ArXiv.

[34]  Christopher Joseph Pal,et al.  RATM: Recurrent Attentive Tracking Model , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Zhihai He,et al.  Spatially supervised recurrent convolutional neural networks for visual object tracking , 2016, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[36]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yann LeCun,et al.  Computing the stereo matching cost with a convolutional neural network , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[39]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[40]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[41]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[44]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[45]  Zheng Zhang,et al.  First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks , 2015, ArXiv.

[46]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[47]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).