论文信息 - HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.

[1] Yang Wang,et al. Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry, and Fusion , 2020, ACM Trans. Multim. Comput. Commun. Appl..

[2] Zheng Qin,et al. Cryptanalysis and enhancement of an image encryption scheme based on a 1-D coupled Sine map , 2020, Nonlinear Dynamics.

[3] Liqiang Nie,et al. Cross-modal recipe retrieval via parallel- and cross-attention networks learning , 2020, Knowl. Based Syst..

[4] Zheng-Jun Zha,et al. Deep Coattention-Based Comparator for Relative Representation Learning in Person Re-Identification , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[5] Ning Han,et al. Video-based recipe retrieval , 2020, Inf. Sci..

[6] Cong Xu,et al. A novel image encryption algorithm based on bit-plane matrix rotation and hyper chaotic systems , 2019, Multimedia Tools and Applications.

[7] Xin Huang,et al. SLTFNet: A spatial and language-temporal tensor fusion network for video moment retrieval , 2019, Inf. Process. Manag..

[8] Qi Tian,et al. Video-Based Cross-Modal Recipe Retrieval , 2019, ACM Multimedia.

[9] Bin Jiang,et al. Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention , 2019, ICMR.

[10] Dezhong Peng,et al. Deep Supervised Cross-Modal Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Xin Wen,et al. Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal Similarities , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[12] Yang Wang,et al. Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification , 2019, IEEE Transactions on Image Processing.

[13] Weiwei Song,et al. Deep Hashing Neural Networks for Hyperspectral Image Feature Extraction , 2019, IEEE Geoscience and Remote Sensing Letters.

[14] Lin Wu,et al. Where-and-When to Look: Deep Siamese Attention Networks for Video-Based Person Re-Identification , 2018, IEEE Transactions on Multimedia.

[15] Tao Li,et al. Rapid image retrieval with binary hash codes based on deep learning , 2018, International Workshop on Pattern Recognition.

[16] Ling Shao,et al. Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[17] Xinbo Gao,et al. Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[18] Yuxin Peng,et al. SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network , 2018, IEEE Transactions on Cybernetics.

[19] Jianmin Wang,et al. Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing , 2017, IEEE Transactions on Cybernetics.

[20] Yang Yang,et al. Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[21] Yuxin Peng,et al. CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning , 2017, ArXiv.

[22] Jiwen Lu,et al. Cross-Modal Deep Variational Hashing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Yang Wang,et al. Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[24] Xuelong Li,et al. Deep Binary Reconstruction for Cross-Modal Hashing , 2017, IEEE Transactions on Multimedia.

[25] Yuxin Peng,et al. MHTN: Modal-Adversarial Hybrid Transfer Network for Cross-Modal Retrieval , 2017, IEEE Transactions on Cybernetics.

[26] Yuxin Peng,et al. CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network , 2017, IEEE Transactions on Multimedia.

[27] Xiaosong Zhao,et al. Semi-supervised semantic factorization hashing for fast cross-modal retrieval , 2017, Multimedia Tools and Applications.

[28] Yao Zhao,et al. Cross-Modal Retrieval With CNN Visual Features: A New Baseline , 2017, IEEE Transactions on Cybernetics.

[29] Yang Wang,et al. Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval , 2017, IEEE Transactions on Image Processing.

[30] Jinjun Chen,et al. Robust Hashing Based on Quaternion Zernike Moments for Image Authentication , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[31] Guojiang Xin,et al. Robust Image Hashing Using Radon Transform and Invariant Features , 2016 .

[32] Tieniu Tan,et al. Group-Invariant Cross-Modal Subspace Learning , 2016, IJCAI.

[33] Yuxin Peng,et al. Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks , 2016, IJCAI.

[34] Ling Shao,et al. Semantic Boosting Cross-Modal Hashing for efficient multimedia retrieval , 2016, Inf. Sci..

[35] C. V. Jawahar,et al. Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36] Christoph Meinel,et al. Deep Semantic Mapping for Cross-Modal Retrieval , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[37] Lin Wu,et al. Effective Multi-Query Expansions: Robust Landmark Retrieval , 2015, ACM Multimedia.

[38] Wenwu Zhu,et al. Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure , 2015, IEEE Transactions on Multimedia.

[39] Jeff A. Bilmes,et al. On Deep Multi-View Representation Learning , 2015, ICML.

[40] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[41] Yueting Zhuang,et al. Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval , 2014, ACM Multimedia.

[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[43] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[44] Guiguang Ding,et al. Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[45] Hanqing Lu,et al. Semi-supervised multi-graph hashing for scalable similarity search , 2014, Comput. Vis. Image Underst..

[46] Guiguang Ding,et al. Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47] Dongqing Zhang,et al. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[48] Xiaohua Zhai,et al. Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[49] Roger Levy,et al. On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50] Renjie Liao,et al. Nonparametric bayesian upstream supervised multi-modal topic models , 2014, WSDM.

[51] Zhou Yu,et al. Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[52] Y. L. Liu,et al. A Robust Image Hashing Algorithm Resistant Against Geometrical Attacks , 2013 .

[53] Jeff A. Bilmes,et al. Deep Canonical Correlation Analysis , 2013, ICML.

[54] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[55] David W. Jacobs,et al. Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Nuno Vasconcelos,et al. Maximum Covariance Unfolding : Manifold Learning for Bimodal Data , 2011, NIPS.

[57] Trevor Darrell,et al. Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[58] Raghavendra Udupa,et al. Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[59] A. Ng,et al. Multimodal Deep Learning , 2011, ICML.

[60] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[61] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[62] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[63] Geng Guangzhi,et al. Content Based Image Hashing Robust to Geometric Transformations , 2009, 2009 Second International Symposium on Electronic Commerce and Security.

[64] Vasant Honavar,et al. Multi-Modal Hierarchical Dirichlet Process Model for Predicting Image Annotation and Image-Object Label Correspondence , 2009, SDM.

[65] Han-ling Zhang,et al. A Novel Image Authentication Robust to Geometric Transformations , 2008, 2008 Congress on Image and Signal Processing.

[66] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[67] H. Hotelling. Relations Between Two Sets of Variates , 1936 .

[68] Huazhong Shu,et al. Robust hashing for image authentication using SIFT feature and quaternion Zernike moments , 2015, Multimedia Tools and Applications.