Optimized Graph Learning Using Partial Tags and Multiple Features for Image and Video Annotation

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available. This is often done by adding a geometry-based regularization term in the objective function of a supervised learning model. In this case, a similarity graph is indispensable to exploit the geometrical relationships among the training data points, and the graph construction scheme essentially determines the performance of these graph-based learning algorithms. However, most of the existing works construct the graph empirically and are usually based on a single feature without using the label information. In this paper, we propose a semi-supervised annotation approach by learning an optimized graph (OGL) from multi-cues (i.e., partial tags and multiple features), which can more accurately embed the relationships among the data points. Since OGL is a transductive method and cannot deal with novel data points, we further extend our model to address the out-of-sample issue. Extensive experiments on image and video annotation show the consistent superiority of OGL over the state-of-the-art methods.

[1]  Shuicheng Yan,et al.  Latent Low-Rank Representation for subspace segmentation and feature extraction , 2011, 2011 International Conference on Computer Vision.

[2]  Shuicheng Yan,et al.  Semi-supervised Learning by Sparse Representation , 2009, SDM.

[3]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[4]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Ivor W. Tsang,et al.  Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction , 2010, IEEE Transactions on Image Processing.

[6]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[7]  Robinson Piramuthu,et al.  ConceptLearner: Discovering visual concepts from weakly labeled image collections , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[10]  Mikhail Belkin,et al.  Beyond the point cloud: from transductive to semi-supervised learning , 2005, ICML.

[11]  Zhang Yi,et al.  Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering , 2012, IEEE Transactions on Cybernetics.

[12]  Nicu Sebe,et al.  Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Alberto Del Bimbo,et al.  A Cross-media Model for Automatic Image Annotation , 2014, ICMR.

[14]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[16]  Zhixun Su,et al.  Fixed-rank representation for unsupervised visual learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[18]  Rongrong Ji,et al.  Weakly Supervised Multi-Graph Learning for Robust Image Reranking , 2014, IEEE Transactions on Multimedia.

[19]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[20]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[22]  Ramesh C. Jain,et al.  Image annotation by kNN-sparse graph-based label propagation over noisily tagged web images , 2011, TIST.

[23]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[24]  Jiangping Wang,et al.  Data Clustering by Laplacian Regularized L1-Graph , 2014, AAAI.

[25]  Xing Xu,et al.  Tag completion with defective tag assignments via image-tag re-weighting , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[26]  René Vidal,et al.  Motion Segmentation in the Presence of Outlying, Incomplete, or Corrupted Trajectories , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Cordelia Schmid,et al.  Image annotation with tagprop on the MIRFLICKR set , 2010, MIR '10.

[28]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Rong Jin,et al.  Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Dit-Yan Yeung,et al.  Semisupervised Generalized Discriminant Analysis , 2011, IEEE Transactions on Neural Networks.

[31]  Wolfgang Nejdl,et al.  An adaptive teleportation random walk model for learning social tag relevance , 2014, SIGIR.

[32]  Nicu Sebe,et al.  Graph-without-cut: An Ideal Graph Learning for Image Segmentation , 2016, AAAI.

[33]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[34]  René Vidal,et al.  Segmenting Motions of Different Types by Unsupervised Manifold Clustering , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Nicu Sebe,et al.  Joint Graph Learning and Video Segmentation via Multiple Cues and Topology Calibration , 2016, ACM Multimedia.

[37]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[38]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[39]  Jingjing Zheng,et al.  Learning View-Invariant Sparse Representations for Cross-View Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Aram Kawewong,et al.  Online incremental attribute-based zero-shot learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Zi Huang,et al.  Robust Hashing With Local Models for Approximate Similarity Search , 2014, IEEE Transactions on Cybernetics.

[42]  Ivor W. Tsang,et al.  Tag-Based Image Retrieval Improved by Augmented Features and Group-Based Refinement , 2012, IEEE Transactions on Multimedia.

[43]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[44]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[45]  Hassan Foroosh,et al.  Feature-independent context estimation for automatic image annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[47]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[48]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[49]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Wesley De Neve,et al.  Visually weighted neighbor voting for image tag relevance learning , 2014, Multimedia Tools and Applications.

[51]  Chong-Wah Ngo,et al.  Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation , 2012, IEEE Transactions on Image Processing.

[52]  Shih-Fu Chang,et al.  Graph transduction via alternating minimization , 2008, ICML '08.

[53]  Tao Mei,et al.  Image tag refinement by regularized latent Dirichlet allocation , 2013, Comput. Vis. Image Underst..

[54]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[55]  Subramanian Ramanathan,et al.  No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion , 2013, 2013 IEEE International Conference on Computer Vision.

[56]  Rainer Lienhart,et al.  Leveraging community metadata for multimodal image ranking , 2010, Multimedia Tools and Applications.

[57]  Rong Jin,et al.  Image Tag Completion by Noisy Matrix Recovery , 2014, ECCV.

[58]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.