Robust Joint Discriminative Feature Learning for Visual Tracking

Because of the complementarity of multiple visual cues (features) in appearance modeling, many tracking algorithms attempt to fuse multiple features to improve the tracking performance from two aspects: increasing the representation accuracy against appearance variations and enhancing the discriminability between the tracked target and its background. Since both these two aspects simultaneously contribute to the success of a visual tracker, how to fully unleash the capabilities of multiple features from these two aspects in appearance modeling is a key issue for feature fusion-based visual tracking. To address this problem, different from other feature fusion-based trackers which consider one of these two aspects only, this paper proposes an unified feature learning framework which simultaneously exploits both the representation capability and the discriminability of multiple features for visual tracking. In particular, the proposed feature learning framework is capable of: 1) learning robust features by separating out corrupted features for accurate feature representation, 2) seamlessly imposing the discriminabiltiy of multiple visual cues into feature learning, and 3) fusing features by exploiting their shared and feature-specific discriminative information. Extensive experiment results on challenging videos show that the the proposed tracker performs favourably against other ten state-of-the-art trackers.

[1]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Hao Gao,et al.  Online discriminative dictionary learning via label information for multi task object tracking , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Ming-Hsuan Yang,et al.  Robust Object Tracking with Online Multiple Instance Learning , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Zili Zhang,et al.  Protein Function Prediction by Integrating Multiple Kernels , 2013, IJCAI.

[5]  Wei Li,et al.  Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shengping Zhang,et al.  Online Dictionary Learning on Symmetric Positive Definite Manifolds with Vision Applications , 2015, AAAI.

[7]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Lei Zhang,et al.  Real-Time Compressive Tracking , 2012, ECCV.

[9]  Jiwen Lu,et al.  MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Laura Sevilla-Lara,et al.  Distribution fields for tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[12]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[13]  Pong C. Yuen,et al.  Robust Visual Tracking via Basis Matching , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Shai Avidan,et al.  Locally Orderless Tracking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Gérard G. Medioni,et al.  Online Tracking and Reacquisition Using Co-trained Generative and Discriminative Trackers , 2008, ECCV.

[16]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[17]  Jing Liu,et al.  Partially Shared Latent Factor Learning With Multiview Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[19]  Zhibin Hong,et al.  Robust Multitask Multiview Tracking in Videos , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Rama Chellappa,et al.  Joint Sparse Representation and Robust Feature-Level Fusion for Multi-Cue Visual Tracking , 2015, IEEE Transactions on Image Processing.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Pong C. Yuen,et al.  Multi-cue Visual Tracking Using Robust Feature-Level Fusion Based on Joint Sparse Representation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[24]  Yi Li,et al.  DeepTrack: Learning Discriminative Feature Representations by Convolutional Neural Networks for Visual Tracking , 2014, BMVC.

[25]  Shengping Zhang,et al.  Sparse coding based visual tracking: Review and experimental comparison , 2013, Pattern Recognit..

[26]  Philippe C. Cattin,et al.  Tracking the invisible: Learning where the object might be , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[28]  Horst Bischof,et al.  On-line Boosting and Vision , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[30]  Zhibin Hong,et al.  Tracking via Robust Multi-task Multi-view Joint Sparse Representation , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  David Zhang,et al.  Relaxed collaborative representation for pattern classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[33]  Wei Li,et al.  Multi-Modality Tracker Aggregation: From Generative to Discriminative , 2015, IJCAI.