Semi-Supervised Learning Based Semantic Cross-Media Retrieval

With the advent of the era of big data, information has gradually changed from a single modal to a diversified form, such as image, text, video, audio, etc. With the growth of multimedia data, the key problem faced by cross-media retrieval technology is how to quickly retrieve multimedia data with different modalities of the same semantic. At present, many cross-media retrieval techniques use local annotated samples for training. In this way, the semantic information of the data cannot be fully utilized, and manual annotation is required, which is rather labor-intensive prone to errors and subjective viewing. Therefore, this paper proposes a Semi-Supervised learning based Semantic Cross-Media Retrieval (S3CMR) method aiming at the above problems. The main advantage of this method is to make full use of the relationship between the semantic information of the labeled samples and the unlabeled samples. Simultaneously, we integrate the linear regression term, correlation analysis term, and feature selection term into a joint cross-media learning framework. These terms interact with each other and embed more semantics in the shared subspace. Furthermore, an iterative method guaranteed with convergence is proposed to solve the formulated optimization problem. Experimental results on three publicly available datasets demonstrate that the proposed method outperforms eight state-of-the-art cross-media retrieval methods.

[1]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[2]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Niranjan Khandelwal,et al.  Content-Based Image Retrieval System for Pulmonary Nodules: Assisting Radiologists in Self-Learning and Diagnosis of Lung Cancer , 2016, Journal of Digital Imaging.

[4]  Qiang Wang,et al.  Joint graph regularization based modality-dependent cross-media retrieval , 2018, Multimedia Tools and Applications.

[5]  Beng Chin Ooi,et al.  Effective deep learning-based multi-modal retrieval , 2015, The VLDB Journal.

[6]  Cyrus Rashtchian,et al.  Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[7]  Xiaojun Chang,et al.  Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval , 2019, IEEE Transactions on Multimedia.

[8]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[9]  Wenjing Lou,et al.  A privacy-preserved full-text retrieval algorithm over encrypted data for cloud storage applications , 2017, J. Parallel Distributed Comput..

[10]  Chong-Wah Ngo,et al.  Learning Query and Image Similarities with Ranking Canonical Correlation Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Qingming Huang,et al.  Online low-rank similarity function learning with adaptive relative margin for cross-modal retrieval , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[12]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[13]  Xiaohua Zhai,et al.  Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Yuxin Peng,et al.  Reinforced Cross-Media Correlation Learning by Context-Aware Bidirectional Translation , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Jian Pei,et al.  Parallel field alignment for cross media retrieval , 2013, ACM Multimedia.

[16]  Wei Wang,et al.  A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.

[17]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Wei Wang,et al.  Continuum regression for cross-modal multimedia retrieval , 2012, 2012 19th IEEE International Conference on Image Processing.

[19]  Yueting Zhuang,et al.  Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval , 2013, AAAI.

[20]  Xing Xie,et al.  Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval , 2009, IEEE Transactions on Multimedia.

[21]  Ling Shao,et al.  Cross-Modality Submodular Dictionary Learning for Information Retrieval , 2014, CIKM.

[22]  Dahua Lin,et al.  Inter-modality Face Recognition , 2006, ECCV.

[23]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[24]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[25]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Hong Zhang,et al.  Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval , 2013, Neurocomputing.

[27]  Xiao-Yuan Jing,et al.  Intra-View and Inter-View Supervised Correlation Analysis for Multi-View Feature Learning , 2014, AAAI.

[28]  FengJiashi,et al.  Modality-Dependent Cross-Media Retrieval , 2016 .

[29]  Fakhri Karray,et al.  An efficient concept-based retrieval model for enhancing text retrieval quality , 2013, ICUIMC '13.

[30]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Yuxin Peng,et al.  TPCKT: Two-Level Progressive Cross-Media Knowledge Transfer , 2019, IEEE Transactions on Multimedia.