A unified supervised codebook learning framework for classification

In this paper, we investigate a discriminative visual dictionary learning method for boosting the classification performance. Tied to the K-means clustering philosophy, those popular algorithms for visual dictionary learning cannot guarantee the well-separation of the normalized visual word frequency vectors from distinctive classes or large label distances. The rationale of this work is to harness sample label information for learning visual dictionary in a supervised manner, and this target is then formulated as an objective function, where each sample element, e.g., SIFT descriptor, is expected to be close to its assigned visual word, and at the same time the normalized aggregative visual word frequency vectors are expected to possess the property that kindred samples shall be close to each other while inhomogeneous samples shall be far away. By relaxing the hard binary constraints to soft nonnegative ones, a multiplicative nonnegative update procedure is proposed to optimize the objective function along with theoretic convergence proof. Extensive experiments on classification tasks (i.e., natural scene and sports event classifications) all demonstrate the superiority of this proposed framework over conventional clustering based visual dictionary learning.

[1]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[2]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[3]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[5]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Ming Liu,et al.  Regression from patch-kernel , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[9]  Yihong Gong,et al.  Discriminative learning of visual words for 3D human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[12]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[15]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[16]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[17]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[19]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[20]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.