Modality-specific matrix factorization hashing for cross-modal retrieval

Cross-modal retrieval has been attracted attentively in the past years. Recently, the collective matrix factorization was proposed to learn the common representations for cross-modal retrieval based on assumption that the pairwise data from different modalities should have the same common semantic representations. However, this unified common representation could inherently sacrifice the modality-specific representations for each modality because the distributions and representations of different modalities are inconsistent. To mitigate this problem, in this paper, we propose Modality-specific Matrix Factorization Hashing (MsMFH) via alignment, which learns the modality-specific semantic representation for each modality and then aligns the representations via the correlation information. Specifically, we factorize the original feature representations into individual latent semantic representations, and then align the distributions of individual latent semantic representations via an orthogonal transformation. Then, we embed the class label into the hash codes learning via latent semantic space, and obtain hash codes directly by an efficient optimization with a closed solution. Extensive experimental results on three public datasets demonstrate that the proposed method outperforms to many existing cross-modal hashing methods up to 3% in term of mean average precision (mAP).

[1]  Xinbo Gao,et al.  Semantic Topic Multimodal Hashing for Cross-Media Retrieval , 2015, IJCAI.

[2]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[3]  Ling Shao,et al.  Supervised Matrix Factorization Hashing for Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[4]  Xu Zhang,et al.  Chinese medical question answer selection via hybrid models based on CNN and GRU , 2019, Multimedia Tools and Applications.

[5]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Huimin Lu,et al.  Brain Intelligence: Go beyond Artificial Intelligence , 2017, Mobile Networks and Applications.

[7]  Xin Huang,et al.  An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Xin-Shun Xu Dictionary Learning Based Hashing for Cross-Modal Retrieval , 2016, ACM Multimedia.

[9]  Huimin Lu,et al.  Learning unified binary codes for cross-modal retrieval via latent semantic hashing , 2016, Neurocomputing.

[10]  Xin Luo,et al.  SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross-Modal Retrieval , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Wei Liu,et al.  Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Kien A. Hua,et al.  Learning Label Preserving Binary Codes for Multimedia Retrieval , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[13]  Fangfang Li,et al.  Deep hierarchical encoding model for sentence semantic matching , 2020, J. Vis. Commun. Image Represent..

[14]  Huimin Lu,et al.  Ternary Adversarial Networks With Self-Supervision for Zero-Shot Cross-Modal Retrieval , 2020, IEEE Transactions on Cybernetics.

[15]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  D. Surendran,et al.  The efficient fast-response content-based image retrieval using spark and MapReduce model framework , 2020, Journal of Ambient Intelligence and Humanized Computing.

[17]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[18]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[19]  Qi Tian,et al.  Discrete Robust Supervised Hashing for Cross-Modal Retrieval , 2019, IEEE Access.

[20]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[21]  Huchuan Lu,et al.  Robust Visual Tracking via Least Soft-Threshold Squares , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[23]  Wei Zhang,et al.  SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval , 2018, ACM Multimedia.

[24]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[25]  Yujie Li,et al.  Deep Fuzzy Hashing Network for Efficient Image Retrieval , 2021, IEEE Transactions on Fuzzy Systems.

[26]  Tao Yao,et al.  Discrete Semantic Alignment Hashing for Cross-Media Retrieval , 2020, IEEE Transactions on Cybernetics.

[27]  Amjad Rehman,et al.  Query-by-visual-search: multimodal framework for content-based image retrieval , 2020, Journal of Ambient Intelligence and Humanized Computing.

[28]  Yuxin Peng,et al.  Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network , 2017, IEEE Transactions on Image Processing.

[29]  Huchuan Lu,et al.  On-line learning parts-based representation via incremental orthogonal projective non-negative matrix factorization , 2013, Signal Process..

[30]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[31]  Ran He,et al.  Frustratingly Easy Cross-Modal Hashing , 2016, ACM Multimedia.

[32]  Jianping Gou,et al.  Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity , 2019, Multimedia Tools and Applications.

[33]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[34]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[35]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[36]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[37]  Bo Zhou,et al.  Fast key-frame image retrieval of intelligent city security video based on deep feature coding in high concurrent network environment , 2020 .

[38]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[39]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[40]  Jie Li,et al.  Unsupervised Semantic-Preserving Adversarial Hashing for Image Search , 2019, IEEE Transactions on Image Processing.

[41]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.