Concurrent Single-Label Image Classification and Annotation via Efficient Multi-Layer Group Sparse Coding

We present a multi-layer group sparse coding framework for concurrent single-label image classification and annotation. By leveraging the dependency between image class label and tags, we introduce a multi-layer group sparse structure of the reconstruction coefficients. Such structure fully encodes the mutual dependency between the class label, which describes image content as a whole, and tags, which describe the components of the image content. Therefore we propose a multi-layer group based tag propagation method, which combines the class label and subgroups of instances with similar tag distribution to annotate test images. To make our model more suitable for nonlinear separable features, we also extend our multi-layer group sparse coding in the Reproducing Kernel Hilbert Space (RKHS), which further improves performances of image classification and annotation. Moreover, we also integrate our multi-layer group sparse coding with kNN strategy, which greatly improves the computational efficiency. Experimental results on the LabelMe, UIUC-Sports and NUS-WIDE-Object databases show that our method outperforms the baseline methods, and achieves excellent performances in both image classification and annotation tasks.

[1]  Liang-Tien Chia,et al.  Multi-layer group sparse coding — For concurrent image classification and annotation , 2011, CVPR 2011.

[2]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Trevor Darrell,et al.  Beyond spatial pyramids: Receptive field learning for pooled image features , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Fei-Fei Li,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[5]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[6]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[9]  Liang-Tien Chia,et al.  Kernel Sparse Representation for Image Classification and Face Recognition , 2010, ECCV.

[10]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[11]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[12]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Nicolas Le Roux,et al.  Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.

[16]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[17]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  James M. Rehg,et al.  Beyond the Euclidean distance: Creating effective visual codebooks using the Histogram Intersection Kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[21]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Grigorios Tsoumakas,et al.  Clustering based multi-label classification for image annotation and retrieval , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[23]  David Grangier,et al.  A Discriminative Kernel-based Model to Rank Images from Text Queries , 2007 .

[24]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Hai Jin,et al.  Label to region by bi-layer sparsity priors , 2009, MM '09.

[26]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[27]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yang Yu,et al.  Automatic image annotation using group sparsity , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[30]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[31]  Shuicheng Yan,et al.  Large scale natural image classification by sparsity exploration , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[33]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[35]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[37]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[38]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[39]  Samy Bengio,et al.  Group Sparse Coding , 2009, NIPS.

[40]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Shuicheng Yan,et al.  Multi-label sparse coding for automatic image annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[45]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Yonina C. Eldar,et al.  Dictionary Optimization for Block-Sparse Representations , 2010, IEEE Transactions on Signal Processing.

[47]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[48]  Xi Chen,et al.  An Efficient Proximal-Gradient Method for Single and Multi-task Regression with Structured Sparsity , 2010, ArXiv.

[49]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[50]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Hai Jin,et al.  Nonparametric Label-to-Region by search , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.