Joint multi-view representation and image annotation via optimal predictive subspace learning

Abstract Image representation and annotation are two key tasks in practical applications such as image search. Existing methods have tried to learn an effective representation or to predict tags directly using multi-view low-level visual features, which usually contain redundant information. However, these two tasks are closely related and interact on each other. A suitable image representation can yield better image annotation results, which in turn can effectively guide the image representation learning. In this paper, we propose to jointly conduct multi-view representation and image annotation via optimal predictive subspace learning, making the two tasks promote each other. Specifically, for subspace learning, visual structure and semantic information of images are exploited to make the learned subspace more discriminative and compact. For tag prediction, support vector machines (SVM) is adopted to obtain better tag prediction results. Then to simultaneously learn image representation, tag predictors and projection function, the three subproblems are combined into a unified optimization objective function and an alternative optimization algorithm is derived to solve it. Experimental results on four image datasets illustrate that our method is superior to the other image annotation methods.

[1]  Yuan Yan Tang,et al.  High-Order Distance-Based Multiview Stochastic Learning in Image Classification , 2014, IEEE Transactions on Cybernetics.

[2]  Xuelong Li,et al.  Image Annotation by Multiple-Instance Learning With Discriminative Feature Mapping and Selection , 2014, IEEE Transactions on Cybernetics.

[3]  Haroon Idrees,et al.  NMF-KNN: Image Annotation Using Weighted Multi-view Non-negative Matrix Factorization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[5]  Meng Wang,et al.  Scalable Semi-Supervised Learning by Efficient Anchor Graph Regularization , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  C. V. Jawahar,et al.  Image Annotation by Propagating Labels from Semantic Neighbourhoods , 2016, International Journal of Computer Vision.

[7]  Chiou-Shann Fuh,et al.  Multiple Kernel Learning for Dimensionality Reduction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Liang Lin,et al.  Multi-label Image Recognition by Recurrently Discovering Attentional Regions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Nenghai Yu,et al.  Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[11]  Lei Wu,et al.  Tag Completion for Image Retrieval , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[13]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[14]  Feiping Nie,et al.  A Closed Form Solution to Multi-View Low-Rank Regression , 2015, AAAI.

[15]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[17]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[18]  Rong Jin,et al.  Image Tag Completion by Noisy Matrix Recovery , 2014, ECCV.

[19]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[20]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[21]  Qingming Huang,et al.  Joint Multi-View Representation Learning and Image Tagging , 2016, AAAI.

[22]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[23]  Mansour Jamzad,et al.  Efficient multi-modal fusion on supergraph for scalable image annotation , 2015, Pattern Recognit..

[24]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[25]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[26]  Rong Jin,et al.  Large-Scale Image Annotation by Efficient and Robust Kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[28]  Yong Luo,et al.  Group Sparse Multiview Patch Alignment Framework With View Consistency for Image Classification , 2014, IEEE Transactions on Image Processing.

[29]  David Zhang,et al.  Multi-Label Dictionary Learning for Image Annotation , 2016, IEEE Transactions on Image Processing.

[30]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[31]  Feiping Nie,et al.  Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model , 2013, 2013 IEEE International Conference on Computer Vision.

[32]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[33]  Jianmin Wang,et al.  Image Tag Completion via Image-Specific and Tag-Specific Linear Sparse Reconstructions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[35]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[36]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[37]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[38]  Renato D. C. Monteiro,et al.  Group Sparsity in Nonnegative Matrix Factorization , 2012, SDM.

[39]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[40]  Yuan Yan Tang,et al.  Multiview Hessian discriminative sparse coding for image annotation , 2013, Comput. Vis. Image Underst..

[41]  Kilian Q. Weinberger,et al.  Fast Image Tagging , 2013, ICML.

[42]  Xiaochun Cao,et al.  SLED: Semantic Label Embedding Dictionary Representation for Multilabel Image Annotation , 2015, IEEE Transactions on Image Processing.

[43]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[44]  Hanqing Lu,et al.  Semi-supervised multi-graph hashing for scalable similarity search , 2014, Comput. Vis. Image Underst..

[45]  Nicu Sebe,et al.  Optimal graph learning with partial tags and multiple features for image and video annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Xiaochun Cao,et al.  Diversity-induced Multi-view Subspace Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Yong Luo,et al.  Low-Rank Multi-View Learning in Matrix Completion for Multi-Label Image Classification , 2015, AAAI.

[49]  Qingming Huang,et al.  Bilevel Multiview Latent Space Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[50]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Luo Si,et al.  Binary Codes Embedding for Fast Image Tagging with Incomplete Labels , 2014, ECCV.