Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers

This paper improves state-of-the-art on-line trackers that use deep learning. Such trackers train a deep network to pick a specified object out from the background in an initial frame (initialization) and then keep training the model as tracking proceeds (updates). Our core contribution is a meta-learning-based method to adjust deep networks for tracking using off-line training. First, we learn initial parameters and per-parameter coefficients for fast online adaptation. Second, we use training signal from future frames for robustness to target appearance variations and environment changes. The resulting networks train significantly faster during the initialization, while improving robustness and accuracy. We demonstrate this approach on top of the current highest accuracy tracking approach, tracking-by-detection based MDNet and close competitor, the correlation-based CREST. Experimental results on both standard benchmarks, OTB and VOT2016, show improvements in speed, accuracy, and robustness on both trackers.

[1]  Ali Farhadi,et al.  Re$^3$: Re al-Time Recurrent Regression Networks for Visual Tracking of Generic Objects , 2017, IEEE Robotics and Automation Letters.

[2]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[5]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Michael Felsberg,et al.  ECO: Efficient Convolution Operators for Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Margrit Betke,et al.  Randomized Ensemble Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Rui Caseiro,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence High-speed Tracking with Kernelized Correlation Filters , 2022 .

[10]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[12]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[14]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking , 2014, BMVC.

[15]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[16]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[17]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[19]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Bernard Ghanem,et al.  Context-Aware Correlation Filter Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Deva Ramanan,et al.  Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[23]  Jiri Matas,et al.  Face-TLD: Tracking-Learning-Detection applied to faces , 2010, 2010 IEEE International Conference on Image Processing.

[24]  J. Urgen Schmidhuber Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks , 1991 .

[25]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.

[26]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Christopher Joseph Pal,et al.  RATM: Recurrent Attentive Tracking Model , 2015, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Ali Farhadi,et al.  Re3 : Real-Time Recurrent Regression Networks for Object Tracking , 2017, ArXiv.

[29]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[30]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Rynson W. H. Lau,et al.  CREST: Convolutional Residual Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[34]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[35]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[36]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[39]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[40]  Michael Felsberg,et al.  The Visual Object Tracking VOT2013 Challenge Results , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[41]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Antoni B. Chan,et al.  Recurrent Filter Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[43]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[44]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Simon Lucey,et al.  Correlation filters with limited boundaries , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[47]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[48]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[49]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[50]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[51]  Zheng Zhang,et al.  First Step toward Model-Free, Anonymous Object Tracking with Recurrent Neural Networks , 2015, ArXiv.

[52]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[54]  Sepp Hochreiter,et al.  Learning to Learn Using Gradient Descent , 2001, ICANN.

[55]  Sebastian Thrun,et al.  Learning to Learn: Introduction and Overview , 1998, Learning to Learn.

[56]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.