HDMFH: Hypergraph Based Discrete Matrix Factorization Hashing for Multimodal Retrieval

In recent years, hashing based cross-modal retrieval methods have attracted considerable attention for the high retrieval efficiency and low storage cost. However, most of the existing methods neglect the high-order relationship among data samples. In addition, most of them can only deal with two modalities, e.g., image and text, without discussing the scenario of multiple modalities. To address these issues, in this paper, we propose a novel cross-modal hashing method, named Hypergraph Based Discrete Matrix Factorization Hashing (HDMFH), for multimodal retrieval. Different from most previous approaches, our method based on hypergraph regularization and matrix factorization can handle the cross-modal retrieval of more than two modalities, which is known as multimodal retrieval. Extensive experiments demonstrate that HDMFH outperforms the state-of-the-art cross-modal hashing methods.

[1]  Yang Zou,et al.  Textual sentiment analysis via three different attention convolutional neural networks and cross-modality consistent regression , 2018, Neurocomputing.

[2]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaohua Zhai,et al.  Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Geyong Min,et al.  Deep Discrete Cross-Modal Hashing for Cross-Media Retrieval , 2018, Pattern Recognit..

[5]  Jiwen Lu,et al.  Cross-Modal Discrete Hashing , 2018, Pattern Recognit..

[6]  Yilong Yin,et al.  Modality-Specific Structure Preserving Hashing for Cross-Modal Retrieval , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Geyong Min,et al.  Supervised Intra- and Inter-Modality Similarity Preserving Hashing for Cross-Modal Retrieval , 2018, IEEE Access.

[8]  Wei Zhang,et al.  SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval , 2018, ACM Multimedia.

[9]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  Jin Huang,et al.  Hyper-graph Regularized Multi-view Matrix Factorization for Vehicle Identification , 2018, ICCCS.

[13]  Zhihai He,et al.  Hybrid representation learning for cross-modal retrieval , 2019, Neurocomputing.

[14]  Jungong Han,et al.  Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval , 2018, IJCAI.

[15]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[16]  Xiaohua Zhai,et al.  Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[18]  Ling Shao,et al.  Supervised Matrix Factorization Hashing for Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[19]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[20]  Benjamin Elizalde,et al.  Cross Modal Audio Search and Retrieval with Joint Embeddings Based on Text and Audio , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[22]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[23]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[24]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.