Deep semantic cross modal hashing with correlation alignment

Abstract Hashing has been extensively applied to cross modal retrieval due to its low storage and high efficiency. Deep hashing which can well extract features of multi-modal data has received increasing research attention recently. However, most of deep hashing for cross modal retrieval methods do not make full use of the semantic label information and do not fully mine correlation of heterogeneous data. In this paper, we propose a Deep Semantic cross modal hashing with Correlation Alignment (DSCA) method. In DSCA, we design two deep neural networks for image and text modality separately, and learn two hash functions. Firstly, we construct a new similarity for the multi-label data, which can well exploit the semantic information and improve the retrieval accuracy. Simultaneously, we preserve the inter-modal similarity of heterogeneous data features, which can exploit semantic correlation. Secondly, the distributions of heterogeneous data are aligned so as to mine the inter-modal correlation well. Thirdly, the semantic label information is embedded in the hash layer of the text network, which can make the learned hash matrix more stable and make the hash codes more discriminative. Experimental results demonstrate that DSCA outperforms the state-of-the-art methods.

[1]  Xinbo Gao,et al.  Semantic Topic Multimodal Hashing for Cross-Media Retrieval , 2015, IJCAI.

[2]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[3]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[4]  Yuxin Peng,et al.  CM-GANs , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[5]  Tieniu Tan,et al.  Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[7]  Hiroyuki Arai,et al.  Alternating Co-Quantization for Cross-Modal Hashing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[9]  Zi Huang,et al.  Discrete Multimodal Hashing With Canonical Views for Robust Mobile Landmark Search , 2017, IEEE Transactions on Multimedia.

[10]  Yao Zhao,et al.  Cross-Modal Retrieval With CNN Visual Features: A New Baseline , 2017, IEEE Transactions on Cybernetics.

[11]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[12]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[13]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[14]  Meng Zhao,et al.  An angle structure descriptor for image retrieval , 2016, China Communications.

[15]  Yongdong Zhang,et al.  A Fast Uyghur Text Detector for Complex Background Images , 2018, IEEE Transactions on Multimedia.

[16]  Qiang Wang,et al.  Joint graph regularization based modality-dependent cross-media retrieval , 2018, Multimedia Tools and Applications.

[17]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yilong Yin,et al.  Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels , 2018, ACM Multimedia.

[19]  Lei Zhu,et al.  Learning Compact Visual Representation with Canonical Views for Robust Mobile Landmark Search , 2016, IJCAI.

[20]  Ling Shao,et al.  Supervised Matrix Factorization Hashing for Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[21]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[22]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jinhui Tang,et al.  Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[26]  Yuxin Peng,et al.  Deep Cross-Media Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Zi Huang,et al.  Index and Retrieve Multimedia Data: Cross-Modal Hashing by Learning Subspace Relation , 2018, DASFAA.

[28]  Jungong Han,et al.  Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing , 2017, IEEE Transactions on Cybernetics.

[29]  Yao Zhao,et al.  Modality-Dependent Cross-Media Retrieval , 2015, ACM Trans. Intell. Syst. Technol..

[30]  Wei-Chiang Hong,et al.  Short term load forecasting based on phase space reconstruction algorithm and bi-square kernel regression model , 2018, Applied Energy.

[31]  Chao Li,et al.  Deep Joint Semantic-Embedding Hashing , 2018, IJCAI.

[32]  Yang Zhang,et al.  Novel chaotic bat algorithm for forecasting complex motion of floating platforms , 2019, Applied Mathematical Modelling.

[33]  Zichen Zhang,et al.  A Hybrid Seasonal Mechanism with a Chaotic Cuckoo Search Algorithm with a Support Vector Regression Model for Electric Load Forecasting , 2018 .

[34]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[35]  Zi Huang,et al.  Exploring Consistent Preferences: Discrete Hashing with Pair-Exemplar for Scalable Landmark Search , 2017, ACM Multimedia.

[36]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yongdong Zhang,et al.  STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.

[38]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.