Learning discriminative hashing codes for cross-modal retrieval based on multi-view features

Hashing techniques have been applied broadly in retrieval tasks due to their low storage requirements and high speed of processing. Many hashing methods based on a single view have been extensively studied for information retrieval. However, the representation capacity of a single view is insufficient and some discriminative information is not captured, which results in limited improvement. In this paper, we employ multiple views to represent images and texts for enriching the feature information. Our framework exploits the complementary information among multiple views to better learn the discriminative compact hash codes. A discrete hashing learning framework that jointly performs classifier learning and subspace learning is proposed to complete multiple search tasks simultaneously. Our framework includes two stages, namely a kernelization process and a quantization process. Kernelization aims to find a common subspace where multi-view features can be fused. The quantization stage is designed to learn discriminative unified hashing codes. Extensive experiments are performed on single-label datasets (WiKi and MMED) and multi-label datasets (MIRFlickr and NUS-WIDE), and the experimental results indicate the superiority of our method compared with the state-of-the-art methods.

[1]  Shiguang Shan,et al.  Learning Euclidean-to-Riemannian Metric for Point-to-Set Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Changsheng Xu,et al.  Cross-Domain Feature Learning in Multimedia , 2015, IEEE Transactions on Multimedia.

[4]  Jungong Han,et al.  Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing , 2017, IEEE Transactions on Cybernetics.

[5]  Heng Tao Shen,et al.  Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval , 2017, IEEE Transactions on Cybernetics.

[6]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[7]  Qi Tian,et al.  Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning , 2017, IEEE Transactions on Multimedia.

[8]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[10]  Heng Tao Shen,et al.  Robust Cross-view Hashing for Multimedia Retrieval , 2016, IEEE Signal Processing Letters.

[11]  Yao Zhao,et al.  Cross-Modal Retrieval With CNN Visual Features: A New Baseline , 2017, IEEE Transactions on Cybernetics.

[12]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[13]  Devraj Mandal,et al.  Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[15]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[16]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiao-Jun Wu,et al.  Content Based Image Retrieval by combining color, texture and CENTRIST , 2013 .

[18]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wen Gao,et al.  Supervised Distributed Hashing for Large-Scale Multimedia Retrieval , 2018, IEEE Transactions on Multimedia.

[20]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[21]  Jianfei Cai,et al.  Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning , 2017, Pattern Recognit..

[22]  Jun Yu,et al.  Cross-modal subspace learning via kernel correlation maximization and discriminative structure-preserving , 2020, Multimedia Tools and Applications.

[23]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[24]  Qing Li,et al.  MMED: A Multi-domain and Multi-modality Event Dataset , 2019, Inf. Process. Manag..

[25]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[26]  Xiaojun Wu,et al.  A novel contour descriptor for 2D shape matching and its application to image retrieval , 2011, Image Vis. Comput..

[27]  Wei Liu,et al.  Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[28]  Yongfeng Huang,et al.  Twitter100k: A Real-World Dataset for Weakly Supervised Cross-Media Retrieval , 2017, IEEE Transactions on Multimedia.

[29]  Jun Yu,et al.  Discriminative Supervised Hashing for Cross-Modal similarity Search , 2019, Image Vis. Comput..

[30]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[31]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[32]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Tao Mei,et al.  Deep Collaborative Embedding for Social Image Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jinhui Tang,et al.  Weakly Supervised Multimodal Hashing for Scalable Social Image Retrieval , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[36]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[37]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Quan Wang,et al.  Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[39]  Qing Li,et al.  Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning , 2018, PCM.

[40]  Qing Li,et al.  Learning Shared Semantic Space with Correlation Alignment for Cross-Modal Event Retrieval , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[41]  Gang Hua,et al.  Supervised Matrix Factorization for Cross-Modality Hashing , 2016, IJCAI.

[42]  Jun Yu,et al.  Semi-supervised Hashing for Semi-Paired Cross-View Retrieval , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[43]  Xinbo Gao,et al.  Coupled Dictionary Learning with Common Label Alignment for Cross-Modal Retrieval , 2015, IScIDE.

[44]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[45]  Wenyin Liu,et al.  Deep Semantic Space with Intra-class Low-rank Constraint for Cross-modal Retrieval , 2019, ICMR.

[46]  Gang Wang,et al.  Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[47]  Wu-Jun Li,et al.  Scalable Graph Hashing with Feature Transformation , 2015, IJCAI.

[48]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[49]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[50]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Cees G. M. Snoek,et al.  The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.

[52]  Xinbo Gao,et al.  Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[54]  Jun Wang,et al.  Comparing apples to oranges: a scalable solution with heterogeneous hashing , 2013, KDD.

[55]  Changsheng Xu,et al.  Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.

[56]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Miki Haseyama,et al.  A Cross-Modal Approach for Extracting Semantic Relationships Between Concepts Using Tagged Images , 2014, IEEE Transactions on Multimedia.

[58]  Heng Tao Shen,et al.  Collective Reconstructive Embeddings for Cross-Modal Hashing , 2019, IEEE Transactions on Image Processing.