Key Point Detection by Max Pooling for Tracking

Inspired by the recent image feature learning work, we propose a novel key point detection approach for object tracking. Our approach can select mid-level interest key points by max pooling over the local descriptor responses from a set of filters. Linear filters are first learned from targets in first frames. Then max pooling is performed over data driven spatial supporting field to detect discriminant key points, and thus the detected key points bear higher level semantic meanings, which we apply in tracking by structured key point matching. We show that our tracking system is robust to occlusions and cluttered background. Testing on several challenging tracking sequences, we demonstrate that our proposed tracking system can achieve competitive or better performances than the state-of-the-art trackers.

[1]  Sanjoy Dasgupta,et al.  Learning the structure of manifolds using random projections , 2007, NIPS.

[2]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[3]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[5]  Yanxi Liu,et al.  Online Selection of Discriminative Tracking Features , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, CVPR.

[8]  Rynson W. H. Lau,et al.  Visual Tracking via Locality Sensitive Histograms , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Huchuan Lu,et al.  Visual tracking via adaptive structural local sparse appearance model , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Mubarak Shah,et al.  Tracking and Object Classification for Automated Surveillance , 2002, ECCV.

[11]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[12]  Natasha Gelfand,et al.  SURFTrac: Efficient tracking and continuous object recognition using local feature descriptors , 2009, CVPR.

[13]  Haibin Ling,et al.  Robust Visual Tracking using 1 Minimization , 2009 .

[14]  Takahiro Ishikawa,et al.  The template update problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[18]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[19]  Gary R. Bradski,et al.  Real time face and object tracking as a component of a perceptual user interface , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[20]  Huiyu Zhou,et al.  Object tracking using SIFT features and mean shift , 2009, Comput. Vis. Image Underst..

[21]  Ming Yang,et al.  Tracking non-stationary appearances and dynamic feature selection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Junzhou Huang,et al.  Robust tracking using local sparse appearance model and K-selection , 2011, CVPR 2011.

[23]  Ying Wu,et al.  Discriminative Spatial Attention for Robust Tracking , 2010, ECCV.

[24]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Ales Leonardis,et al.  An adaptive coupled-layer visual model for robust visual tracking , 2011, 2011 International Conference on Computer Vision.

[26]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.