Hierarchical density estimation for image classification

This paper presents a novel hierarchical density estimation approach for image classification. We first build a collection of randomized decision trees in a discriminative way to split the feature space into small regions. Then for each region, class-conditional Gaussians are learnt to characterize the “local” distribution of feature vectors falling into that region. The parameters of the Gaussians are reliably estimated through hierarchical maximum a posteriori (MAP) estimation and smoothed across multiple randomized trees in the forest. Compared with the widely-used Gaussian Mixture Models (GMM), our new approach not only yields more reliable parameter estimation, but also greatly reduces the computational cost at the testing stage. Experiments on scene classification demonstrate the effectiveness and efficiency of our new approach.

[1]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[4]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[5]  Rong Jin,et al.  Unifying discriminative visual codebook generation with classifier training for object category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Thomas S. Huang,et al.  A novel Gaussianized vector representation for natural scene categorization , 2008, 2008 19th International Conference on Pattern Recognition.

[9]  Ming Liu,et al.  Regression from patch-kernel , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Shuicheng Yan,et al.  SIFT-Bag kernel for video event analysis , 2008, ACM Multimedia.

[11]  Bernt Schiele,et al.  Recognition without Correspondence using Multidimensional Receptive Field Histograms , 2004, International Journal of Computer Vision.

[12]  Chin-Hui Lee,et al.  Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[13]  Florent Perronnin,et al.  A similarity measure between unordered vector sets with application to image categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.