FasterMDNet: Learning model adaptation by RNN in tracking-by-detection based visual tracking

Recently in VOT competitions, trackers based on tracking-by-detection and deep neural network discriminators reached impressive accuracy. However, these trackers require time-consuming model adaptation methods like online learning to handle target appearance changes, which tremendously increases the time complexity and becomes an obstruction in realworld applications for these tracking algorithms. In this paper, we propose an efficient RNN-based model adaptation method which extremely decreases the time complexity of trackers. The proposed model learns the relations of relative model change in RNN training and predicts the score and the model adaptation state at the same time in testing, which nearly removes the finetuning time in the cost of additional RNN training. The proposed method is applicable to any tracker based on neural network discriminator. The RNN branch can be further designed with more complicated model under the condition of enough training videos. We apply the proposed algorithm to MDNet and create a new tracker: Faster-MDNet. According to the experiment, using our method can nearly remove the time of finetuning and reduce the bottleneck of time-complexity down to the prediction time.

[1]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Bohyung Han,et al.  Modeling and Propagating CNNs in a Tree Structure for Visual Tracking , 2016, ArXiv.

[3]  Zheng Zhang,et al.  First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks , 2015, ArXiv.

[4]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[6]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[7]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[8]  Christopher Joseph Pal,et al.  RATM: Recurrent Attentive Tracking Model , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[11]  Zhihai He,et al.  Spatially supervised recurrent convolutional neural networks for visual object tracking , 2016, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[12]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Haibin Ling,et al.  Robust visual tracking using ℓ1 minimization , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.