Hierarchical Gaussianization for image classification

In this paper, we propose a new image representation to capture both the appearance and spatial information for image classification applications. First, we model the feature vectors, from the whole corpus, from each image and at each individual patch, in a Bayesian hierarchical framework using mixtures of Gaussians. After such a hierarchical Gaussianization, each image is represented by a Gaussian mixture model (GMM) for its appearance, and several Gaussian maps for its spatial layout. Then we extract the appearance information from the GMM parameters, and the spatial information from global and local statistics over Gaussian maps. Finally, we employ a supervised dimension reduction technique called DAP (discriminant attribute projection) to remove noise directions and to further enhance the discriminating power of our representation. We justify that the traditional histogram representation and the spatial pyramid matching are special cases of our hierarchical Gaussianization. We compare our new representation with other approaches in scene classification, object recognition and face recognition, and our performance ranks among the top in all three tasks.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Tsuhan Chen,et al.  A GMM parts based face representation for improved verification through relevance adaptation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[6]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[7]  Tsuhan Chen,et al.  A GMM parts based face representation for improved verification through relevance adaptation , 2004, CVPR 2004.

[8]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  John Shawe-Taylor,et al.  Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels , 2005 .

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Jiawei Han,et al.  Orthogonal Laplacianfaces for Face Recognition , 2006, IEEE Transactions on Image Processing.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[14]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[15]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[16]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[18]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[19]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[22]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Ming Liu,et al.  Regression from patch-kernel , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[26]  Florent Perronnin,et al.  A similarity measure between unordered vector sets with application to image categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[30]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.