Robust visual tracking based on global-and-local search with confidence reliability estimation

Abstract Visual object tracking is an open and challenging problem, an online tracker must be able to keep track of the target object for a long time period even in complex scenarios, such as target drift and background occlusion. Discriminative correlation filters (DCF) have shown excellent performance in short-term target tracking problems thanks to their circular dense sampling mechanism and fast computation with a discrete Fourier transform. However, they tend to drift from the target when the target encounters drastic deformation, fast motion, or background occlusion. This can result in a bad model update since the tracker searches the target in a local region centered at the position where target was located in the previous frame. There is no recovery mechanism for target re-identification and re-location. To handle this issue, this paper proposes a global-and-local-search technique that applies a DCF-based tracking model with a novel target-aware detector in a collaborative way. Our tracking model performs the local search process with high tracking confidence, and the target-aware detector is executed to re-identify and locate the target via global search from the entire frame when the model instability and confidence fluctuation are detected by proposed tracking system. Additionally, we designed an enhanced peak-to-sidelobe ratio (EPSR) for confidence estimation, which indicates system instability and fluctuation degree. Thus, the local tracking model and target-aware detector are collaboratively applied for both final target state estimation and online model updates. This not only avoids model corruption from bad updates, but also prevents our tracker from drifting problems for long-term tracking. Experiments on OTB-100 and VOT2016 benchmarks demonstrate that the proposed tracking method achieves state-of-the-art tracking performance in terms of accuracy and robustness, with 22 fps tracking speed (close to realtime) run on a single GPU.

[1]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hongdong Li,et al.  Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael Felsberg,et al.  Convolutional Features for Correlation Filter Based Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[6]  Huanhuan Chen,et al.  Scalable Graph-Based Semi-Supervised Learning through Sparse Bayesian Model , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Arnold W. M. Smeulders,et al.  UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[8]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[9]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[11]  Michael Felsberg,et al.  Learning Spatially Regularized Correlation Filters for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Jun Li,et al.  Deep Alignment Network Based Multi-Person Tracking With Occlusion and Motion Reasoning , 2019, IEEE Transactions on Multimedia.

[13]  Dapeng Tao,et al.  Online tracking based on efficient transductive learning with sample matching costs , 2016, Neurocomputing.

[14]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[15]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[16]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Yang Fang,et al.  Multi-scale Region Proposal Network Trained by Multi-domain Learning for Visual Object Tracking , 2017, ICONIP.

[20]  Xiaoyu Zhang,et al.  Visual Tracking via Constrained Incremental Non-negative Matrix Factorization , 2015, IEEE Signal Processing Letters.

[21]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[22]  Haibin Ling,et al.  SANet: Structure-Aware Network for Visual Tracking , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Cordelia Schmid,et al.  Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[24]  Bohyung Han,et al.  Learning Multi-domain Convolutional Neural Networks for Visual Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ming-Hsuan Yang,et al.  Long-term correlation tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Pengfei Wang,et al.  Kernel correlation filters for visual tracking with adaptive fusion of heterogeneous cues , 2018, Neurocomputing.

[28]  Michael Felsberg,et al.  Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking , 2016, ECCV.

[29]  Zhenyu He,et al.  The Visual Object Tracking VOT2016 Challenge Results , 2016, ECCV Workshops.

[30]  Luca Bertinetto,et al.  Staple: Complementary Learners for Real-Time Tracking , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yong Liu,et al.  Large Margin Object Tracking with Circulant Feature Maps , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Jiri Matas,et al.  Discriminative Correlation Filter with Channel and Spatial Reliability , 2017, CVPR.

[33]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[35]  Shiqiang Hu,et al.  SIFT flow for large-displacement object tracking. , 2014, Applied optics.

[36]  Michael Felsberg,et al.  Adaptive Color Attributes for Real-Time Visual Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Jun Li,et al.  Hierarchical Tracking by Reinforcement Learning-Based Searching and Coarse-to-Fine Verifying , 2019, IEEE Transactions on Image Processing.

[39]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Shan Gao,et al.  Robust and fast visual tracking via spatial kernel phase correlation filter , 2016, Neurocomputing.

[41]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[42]  Michael Felsberg,et al.  Discriminative Scale Space Tracking , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Qingming Huang,et al.  Hedged Deep Tracking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2013, Comput. Vis. Image Underst..