Hierarchical Tracking by Reinforcement Learning-Based Searching and Coarse-to-Fine Verifying

A class-agnostic tracker typically consists of three key components, i.e., its motion model, its target appearance model, and its updating strategy. However, most recent top-performing trackers mainly focus on constructing complicated appearance models and updating strategies, while using comparatively simple and heuristic motion models that may result in an inefficient search and degrade the tracking performance. To address this issue, we propose a hierarchical tracker that learns to move and track based on the combination of data-driven search at the coarse level and coarse-to-fine verification at the fine level. At the coarse level, a data-driven motion model learned from deep recurrent reinforcement learning provides our tracker with coarse localization of an object. By formulating motion search as an action-decision problem in reinforcement learning, our tracker utilizes a recurrent convolutional neural network-based deep Q-network to effectively learn data-driven searching policies. The learned motion model can not only significantly reduce the search space but also provide more reliable interested regions for further verifying. At the fine level, a kernelized correlation filter (KCF)-based appearance model is adopted to densely yet efficiently verify a local region centered on the predicted location from the motion model. Through use of circulant matrices and fast Fourier transformation, a large number of candidate samples in the local region can be efficiently and effectively evaluated by the KCF-based appearance model. Finally, a simple yet robust estimator is designed to analyze possible tracking failure. The experiments on OTB50 and OTB100 illustrate that our tracker achieves better performance than the state-of-the-art trackers.

[1]  Zhe Chen,et al.  An Experimental Survey on Correlation Filter-based Tracking , 2015, ArXiv.

[2]  Michael Felsberg,et al.  Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Haibin Ling,et al.  Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.

[5]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Changsheng Xu,et al.  Multi-task Correlation Particle Filter for Robust Object Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[10]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[11]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Michael Felsberg,et al.  The Visual Object Tracking VOT2015 Challenge Results , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[15]  Lei Zhang,et al.  Object Tracking via Dual Linear Structured SVM and Explicit Feature Map , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaogang Wang,et al.  STCT: Sequentially Training Convolutional Networks for Visual Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xin Wang,et al.  Deep Reinforcement Learning for Visual Object Tracking in Videos , 2017, ArXiv.

[19]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[20]  Ming Tang,et al.  Multi-kernel Correlation Filter for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Huchuan Lu,et al.  Dual Deep Network for Visual Tracking , 2016, IEEE Transactions on Image Processing.

[22]  Simon Lucey,et al.  Learning Background-Aware Correlation Filters for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Bruno A. Olshausen,et al.  Emergence of foveal image sampling from learning to attend in visual scenes , 2016, ICLR.

[24]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[26]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Guanghui Wang,et al.  Real-Time Visual Tracking: Promoting the Robustness of Correlation Filter Learning , 2016, ECCV.

[28]  Junseok Kwon,et al.  Visual Tracking by Reinforced Decision Making , 2017, ArXiv.

[29]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[31]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[32]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Michael Isard,et al.  The CONDENSATION Algorithm - Conditional Density Propagation and Applications to Visual Tracking , 1996, NIPS.

[34]  Yang Li,et al.  Reliable Patch Trackers: Robust visual tracking by exploiting reliable patches , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Ali Farhadi,et al.  Re3 : Real-Time Recurrent Regression Networks for Object Tracking , 2017, ArXiv.

[39]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Simon Lucey,et al.  Learning Policies for Adaptive Tracking with Deep Feature Cascades , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Qiang Wang,et al.  DCFNet: Discriminant Correlation Filters Network for Visual Tracking , 2017, ArXiv.

[43]  Rynson W. H. Lau,et al.  CREST: Convolutional Residual Learning for Visual Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[48]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Zhongfei Zhang,et al.  A survey of appearance models in visual object tracking , 2013, ACM Trans. Intell. Syst. Technol..

[50]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[51]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[53]  Simon Lucey,et al.  Need for Speed: A Benchmark for Higher Frame Rate Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Deva Ramanan,et al.  Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[55]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[56]  Leslie G. Ungerleider,et al.  Mechanisms of visual attention in the human cortex. , 2000, Annual review of neuroscience.