Robust tracking based on H-CNN with low-resource sampling and scaling by frame-wise motion localization

In big data age, learning with deep models has shown its outstanding effectiveness in a variety of vision tasks. Unfortunately, the requirement of enormous training samples and computational cost still limit its practicability in the low resource media computing based applications such online object tracking. More recently, CNN based feature extraction has helped tracking-by-learning strategies make a significant progress, although the coarse resolution outputs from the last layer still substantially limit a further improvement of tracking performance. By exploiting the hierarchies of convolutional layers as an image pyramid representation, earlier convolutional layers of hierarchical CNN have shown a certain enhancement of spatial localization but are less invariant to target appearance changes, which inevitably led to an inaccurate region for sampling when the non-rigid objects have intrinsic motion. To guarantee a qualified sampling for tracking-by-learning with hierarchical CNN, in this paper, we incorporated an inter-frame motion guidance with the intra-frame appearance correlations by formulating different energy optimization process in both spatial and temporal domains. With an optional functionality for the extracted regions combination, the proposed algorithm is able to achieve more precise target localization for qualified sampling. Experiments on challenging non-rigid tracking benchmark dataset have demonstrated a superior performance of the proposed tracking in comparison to the other state-of-art trackers.

[1]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[2]  Gérard G. Medioni,et al.  Context tracker: Exploring supporters and distracters in unconstrained environments , 2011, CVPR 2011.

[3]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[4]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[5]  Tapani Raiko,et al.  International Conference on Learning Representations (ICLR) , 2016 .

[6]  Bohyung Han,et al.  Tracking-by-Segmentation with Online Gradient Boosting Decision Tree , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Rui Caseiro,et al.  High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[9]  Cheong-Ghil Kim,et al.  A contour tracking method of large motion object using optical flow and active contour model , 2013, Multimedia Tools and Applications.

[10]  Zhengping Wu,et al.  A real-time object tracking via L2-RLS and compressed Haar-like features matching , 2016, Multimedia Tools and Applications.

[11]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Shai Avidan,et al.  Locally Orderless Tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Dit-Yan Yeung,et al.  Learning a Deep Compact Image Representation for Visual Tracking , 2013, NIPS.

[14]  Shuai Liu,et al.  A review of visual moving target tracking , 2017, Multimedia Tools and Applications.

[15]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[16]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[18]  Zhe Chen,et al.  MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Dacheng Tao,et al.  Algorithm-Dependent Generalization Bounds for Multi-Task Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ke Zhang,et al.  A real-time visual object tracking system based on Kalman filter and MB-LBP feature matching , 2014, Multimedia Tools and Applications.

[24]  Seunghoon Hong,et al.  Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network , 2015, ICML.

[25]  Stan Sclaroff,et al.  MEEM: Robust Tracking via Multiple Experts Using Entropy Minimization , 2014, ECCV.

[26]  Michael Felsberg,et al.  Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[27]  Yi Li,et al.  Robust Online Visual Tracking with a Single Convolutional Neural Network , 2014, ACCV.

[28]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[30]  Gang Wang,et al.  Video Tracking Using Learned Hierarchical Features , 2015, IEEE Transactions on Image Processing.

[31]  Luc Van Gool,et al.  European conference on computer vision (ECCV) , 2006, eccv 2006.

[32]  Jianping Fan,et al.  iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning , 2017, IEEE Transactions on Information Forensics and Security.

[33]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[34]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[35]  Junseok Kwon,et al.  Tracking by Sampling Trackers , 2011, 2011 International Conference on Computer Vision.

[36]  Laura Sevilla-Lara,et al.  Distribution fields for tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ming-Hsuan Yang,et al.  Hierarchical Convolutional Features for Visual Tracking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking , 2014, BMVC.