Heterogeneous image feature integration via multi-modal spectral clustering

In recent years, more and more visual descriptors have been proposed to describe objects and scenes appearing in images. Different features describe different aspects of the visual characteristics. How to combine these heterogeneous features has become an increasing critical problem. In this paper, we propose a novel approach to unsupervised integrate such heterogeneous features by performing multi-modal spectral clustering on unlabeled images and unsegmented images. Considering each type of feature as one modal, our new multi-modal spectral clustering (MMSC) algorithm is to learn a commonly shared graph Laplacian matrix by unifying different modals (image features). A non-negative relaxation is also added in our method to improve the robustness and efficiency of image clustering. We applied our MMSC method to integrate five types of popularly used image features, including SIFT, HOG, GIST, LBP, CENTRIST and evaluated the performance by two benchmark data sets: Caltech-101 and MSRC-v1. Compared with existing unsupervised scene and object categorization methods, our approach always achieves superior performances measured by three standard clustering evaluation metrices.

[1]  Daniel D. Lee,et al.  Multiplicative Updates for Classification by Mixture Models , 2001, NIPS.

[2]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[5]  Yong Jae Lee,et al.  Foreground Focus: Unsupervised Learning from Partially Matching Images , 2009, International Journal of Computer Vision.

[6]  Jiebo Luo,et al.  Heterogeneous feature machines for visual recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Trevor Darrell,et al.  Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[10]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Chris H. Q. Ding,et al.  Nonnegative Matrix Factorization for Combinatorial Optimization: Spectral Clustering, Graph Matching, and Clique Finding , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[13]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[15]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Ivor W. Tsang,et al.  Dynamic vehicle routing with stochastic requests , 2003, IJCAI 2003.

[17]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  James M. Rehg,et al.  Where am I: Place instance and category recognition using spatial PACT , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[22]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[24]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).