Learning Hierarchical Semantic Description Via Mixed-Norm Regularization for Image Understanding

This paper proposes a new perspective-Vicept representation to solve the problem of visual polysemia and concept polymorphism in the large-scale semantic image understanding. Vicept characterizes the membership probability distribution between visual appearances and semantic concepts, and forms a hierarchical representation of image semantic from local to global. In the implementation, incorporating group sparse coding, visual appearance is encoded as a weighted sum of dictionary elements, which could obtain more accurate image representation with sparsity at the image level. To obtain discriminative Vicept descriptions with structural sparsity, mixed-norm regularization is adopted in the optimization problem for learning the concept membership distribution of visual appearance. Furthermore, we introduce a novel image distance measurement based on the hierarchical Vicept description, where different levels of Vicept distance are fused together by multi-level separability analysis. Finally, the wide applications of Vicept description are validated in our experiments, including large-scale semantic image search, image annotation, and semantic image re-ranking.

[1]  Wen Gao,et al.  Vicept: link visual features to concepts for large-scale image understanding , 2010, ACM Multimedia.

[2]  Martin J. Wainwright,et al.  Phase transitions for high-dimensional joint support recovery , 2008, NIPS.

[3]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Liang-Tien Chia,et al.  Image-to-Class Distance Metric Learning for Image Classification , 2010, ECCV.

[7]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[8]  G. Obozinski Joint covariate selection for grouped classification , 2007 .

[9]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[10]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[13]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Cor J. Veenman,et al.  Comparing compact codebooks for visual categorization , 2010, Comput. Vis. Image Underst..

[15]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Rong Jin,et al.  Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition , 2010, NIPS.

[17]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[19]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[21]  Martial Hebert,et al.  Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation , 2008, ECCV.

[22]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[23]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[26]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[29]  Hongping Cai,et al.  Learning weights for codebook in image classification and retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Alexander M. Bronstein,et al.  Spatially-Sensitive Affine-Invariant Image Descriptors , 2010, ECCV.

[31]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[32]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Le Li,et al.  SENSC: a Stable and Efficient Algorithm for Nonnegative Sparse Coding: SENSC: a Stable and Efficient Algorithm for Nonnegative Sparse Coding , 2009 .

[34]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[35]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[36]  Cordelia Schmid,et al.  Multiple Instance Metric Learning from Automatically Labeled Bags of Faces , 2010, ECCV.

[37]  Yoram Singer,et al.  Boosting with structural sparsity , 2009, ICML '09.

[38]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[39]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Qingming Huang,et al.  Learning image Vicept description via mixed-norm regularization for large scale semantic image search , 2011, CVPR 2011.

[43]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[44]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.