Robust Visual Vocabulary Tracking Using Hierarchical Model Fusion

In this paper, we propose a new visual tracking approach based on the Hierarchical Model Fusion framework, which fuses two different trackers to cope with different tracking problems. We use an Incremental Multiple Principal Component Analysis tracker as our main model as well as an image patch tracker as our auxiliary model. Firstly, we randomly sample image patches within the target region obtained by the main model in the training frames for constructing a visual vocabulary using Histogram of Oriented Gradient features. Secondly, we use a supervised learning algorithm based on a Gaussian Mixture Model, which not only operates on supervised information to improve the discriminative power of the clusters, but also increases the purity of the clusters. Then, auxiliary models are initialised by obtaining confidence scores of image patches based on the similarity between candidates and codewords. In addition, an updating procedure and a result refinement scheme are included in the proposed tracking approach. Experiments on challenging video sequences demonstrate the robustness of the proposed approach to handling occlusion, pose variation and rotation.

[1]  Huchuan Lu,et al.  Incremental MPCA for Color Object Tracking , 2010, 2010 20th International Conference on Pattern Recognition.

[2]  Guang Yang,et al.  Complementary Visual Tracking , 2011, 2011 18th IEEE International Conference on Image Processing.

[3]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[4]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Horst Bischof,et al.  Real-Time Tracking via On-line Boosting , 2006, BMVC.

[6]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[9]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[11]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Sergios Theodoridis,et al.  A hierarchical feature fusion framework for adaptive visual tracking , 2011, Image Vis. Comput..

[13]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[14]  Fatih Murat Porikli,et al.  Integral histogram: a fast way to extract histograms in Cartesian spaces , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Bernt Schiele,et al.  Multiple Object Class Detection with a Generative Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Changhu Wang,et al.  Probabilistic models for supervised dictionary learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Huchuan Lu,et al.  Superpixel tracking , 2011, 2011 International Conference on Computer Vision.

[19]  Horst Bischof,et al.  PROST: Parallel robust online simple tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Basura Fernando,et al.  Accurate visual word construction using a supervised approach , 2010, 2010 25th International Conference of Image and Vision Computing New Zealand.

[22]  Marc Sebban,et al.  Supervised learning of Gaussian mixture models for visual vocabulary generation , 2012, Pattern Recognit..

[23]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..