Multimodal visual dictionary learning via heterogeneous latent semantic sparse coding

Visual dictionary learning as a crucial task of image representation has gained increasing attention. Specifically, sparse coding is widely used due to its intrinsic advantage. In this paper, we propose a novel heterogeneous latent semantic sparse coding model. The central idea is to bridge heterogeneous modalities by capturing their common sparse latent semantic structure so that the learned visual dictionary is able to describe both the visual and textual properties of training data. Experiments on both image categorization and retrieval tasks demonstrate that our model shows superior performance over several recent methods such as K-means and Sparse Coding.

[1]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[2]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[3]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[4]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[5]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[7]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Larry S. Davis,et al.  Learning a discriminative dictionary for sparse coding via label consistent K-SVD , 2011, CVPR 2011.

[9]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[10]  Larry S. Davis,et al.  Submodular dictionary learning for sparse coding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[12]  Jianping Fan,et al.  Learning inter-related visual dictionary for object recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[14]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Guillermo Sapiro,et al.  On the Integration of Topic Modeling and Dictionary Learning , 2011, ICML.

[16]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .