Towards multi-semantic image annotation with graph regularized exclusive group lasso

To bridge the semantic gap between low level feature and human perception, most of the existing algorithms aim mainly at annotating images with concepts coming from only one semantic space, e.g. cognitive or affective. The naive combination of the outputs from these spaces will implicitly force the conditional independence and ignore the correlations among the spaces. In this paper, to exploit the comprehensive semantic of images, we propose a general framework for harmoniously integrating the above multiple semantics, and investigating the problem of learning to annotate images with training images labeled in two or more correlated semantic spaces, such as fascinating nighttime, or exciting cat. This kind of semantic annotation is more oriented to real world search scenario. Our proposed approach outperforms the baseline algorithms by making the following contributions. 1) Unlike previous methods that annotate images within only one semantic space, our proposed multi-semantic annotation associates each image with labels from multiple semantic spaces. 2) We develop a multi-task linear discriminative model to learn a linear mapping from features to labels. The tasks are correlated by imposing the exclusive group lasso regularization for competitive feature selection, and the graph Laplacian regularization to deal with insufficient training sample issue. 3) A Nesterov-type smoothing approximation algorithm is presented for efficient optimization of our model. Extensive experiments on NUS-WIDEEmotive dataset (56k images) with 8×81 emotive cognitive concepts and Object&Scene datasets from NUS-WIDE well validate the effectiveness of the proposed approach.

[1]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[2]  Sam J. Maglio,et al.  Emotional category data on images from the international affective picture system , 2005, Behavior research methods.

[3]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[4]  Nicu Sebe,et al.  Emotional valence categorization using holistic image features , 2008, 2008 15th IEEE International Conference on Image Processing.

[5]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[6]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[7]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[8]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[10]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[11]  Yu Ying-lin,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[12]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[13]  Shuicheng Yan,et al.  Efficient large-scale image annotation by probabilistic collaborative multi-label propagation , 2010, ACM Multimedia.

[14]  Massimo Fornasier,et al.  Recovery Algorithms for Vector-Valued Data with Joint Sparsity Constraints , 2008, SIAM J. Numer. Anal..

[15]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[16]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[18]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[19]  Masafumi Hagiwara,et al.  Image query by impression words-the IQI system , 1998 .

[20]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[21]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[22]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[23]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[25]  Shengming Jiang,et al.  Image Retrieval by Emotional Semantics: A Study of Emotional Space and Feature Extraction , 2006, SMC.

[26]  Shuicheng Yan,et al.  Inferring semantic concepts from community-contributed images and noisy tags , 2009, ACM Multimedia.

[27]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[28]  Changle Zhou,et al.  Content-Based Affective Image Classification and Retrieval Using Support Vector Machines , 2005, ACII.

[29]  Massimiliano Pontil,et al.  Multi-task Learning , 2020, Transfer Learning.

[30]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[31]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[32]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[33]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[34]  A. Hanjalic,et al.  Extracting moods from pictures and sounds: towards truly personalized TV , 2006, IEEE Signal Processing Magazine.

[35]  M. Kowalski Sparse regression using mixed norms , 2009 .

[36]  Dong Liu,et al.  Tag ranking , 2009, WWW '09.

[37]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[38]  Bruno Torrésani,et al.  Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients , 2009, Signal Image Video Process..

[39]  Kraisak Kesorn,et al.  Multi modal multi-semantic image retrieval , 2010 .

[40]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[41]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[42]  Han Liu,et al.  Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery , 2009, ICML '09.

[43]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[44]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.