Multi-label sparse coding for automatic image annotation

In this paper, we present a multi-label sparse coding framework for feature extraction and classification within the context of automatic image annotation. First, each image is encoded into a so-called supervector, derived from the universal Gaussian Mixture Models on orderless image patches. Then, a label sparse coding based subspace learning algorithm is derived to effectively harness multi-label information for dimensionality reduction. Finally, the sparse coding method for multi-label data is proposed to propagate the multi-labels of the training images to the query image with the sparse ℓ1 reconstruction coefficients. Extensive image annotation experiments on the Corel5k and Corel30k databases both show the superior performance of the proposed multi-label sparse coding framework over the state-of-the-art algorithms.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[3]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[4]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[5]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[8]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[9]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[10]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[11]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[12]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[13]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Changhu Wang,et al.  Image annotation refinement using random walk with restarts , 2006, MM '06.

[16]  Nuno Vasconcelos,et al.  Using Statistics to Search and Annotate Pictures: an Evaluation of Semantic Image Annotation and Retrieval on Large Databases , 2006 .

[17]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[18]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  R. Fergus,et al.  Tiny images , 2007 .

[20]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Qi Zhang,et al.  Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching , 2007, CIVR '07.

[22]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[23]  Changhu Wang,et al.  Scalable search-based image annotation , 2008, Multimedia Systems.

[24]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.