Global and local semantics-preserving based deep hashing for cross-modal retrieval

Abstract Cross-modal hashing methods map similar data entities from heterogeneous data sources to binary codes with smaller Hamming distance. However, most existing cross-modal hashing methods learn the hash codes with the hand-crafted features which will not generate optimal hash codes and achieve satisfactory performance. Deep cross-modal hashing methods integrate feature learning and hash coding into an end-to-end learning framework which have achieved promising results. However, these deep cross-modal hashing methods do not well preserve the discriminative ability and the global multilevel similarity in hash learning procedure. In this paper, we propose a global and local semantics-preserving based deep hashing method for cross-modal retrieval. More specifically, a large margin is enforced between similar hash codes and dissimilar hash codes from an inter-modal view to learn discriminative hash codes. Therefore the learned hash codes can well preserve local semantic structure. Sequently, the supervised information with the global multilevel similarity is introduced to learn semantics-preserving hash codes for each intra-modal view. As a consequence, the global semantic structure can be preserved into the hash codes. Furthermore, a consistent regularization constraint is added to generate unified hash codes. Ultimately, the feature learning procedure and the hash coding procedure are integrated into an end-to-end learning framework. To verify the effectiveness of the proposed method, extensive experiments are conducted on several datasets, and the experimental results demonstrate that the proposed method achieves superior performance.

[1]  King Ngi Ngan,et al.  Learning Efficient Binary Codes From High-Level Feature Representations for Multilabel Image Retrieval , 2017, IEEE Transactions on Multimedia.

[2]  Gang Wang,et al.  Multi-manifold deep metric learning for image set classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xianglong Liu,et al.  Collaborative Hashing , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[5]  Wu-Jun Li,et al.  Feature Learning Based Deep Supervised Hashing with Pairwise Labels , 2015, IJCAI.

[6]  Lei Zhu,et al.  Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval , 2016, Multimedia Tools and Applications.

[7]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face and Kinship Verification , 2017, IEEE Transactions on Image Processing.

[9]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Hiroyuki Arai,et al.  Alternating Co-Quantization for Cross-Modal Hashing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Wei Liu,et al.  Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yadong Mu,et al.  Boosting Complementary Hash Tables for Fast Nearest Neighbor Search , 2017, AAAI.

[14]  Jingdong Wang,et al.  Composite Quantization for Approximate Nearest Neighbor Search , 2014, ICML.

[15]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[16]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[17]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[18]  Lei Huang,et al.  Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search , 2016, IEEE Transactions on Image Processing.

[19]  Xianglong Liu,et al.  Adaptive multi-bit quantization for hashing , 2015, Neurocomputing.

[20]  Xinbo Gao,et al.  Semantic Topic Multimodal Hashing for Cross-Media Retrieval , 2015, IJCAI.

[21]  Shiming Xiang,et al.  Cross-Modal Hashing via Rank-Order Preserving , 2017, IEEE Transactions on Multimedia.

[22]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[23]  Yang Yang,et al.  Supervised hashing with adaptive discrete optimization for multimedia retrieval , 2017, Neurocomputing.

[24]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jianmin Wang,et al.  Correlation Hashing Network for Efficient Cross-Modal Retrieval , 2016, BMVC.

[26]  Kien A. Hua,et al.  Linear Subspace Ranking Hashing for Cross-Modal Retrieval , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[29]  Le Song,et al.  Stochastic Generative Hashing , 2017, ICML.

[30]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[31]  Yadong Mu,et al.  Large-scale multi-task image labeling with adaptive relevance discovery and feature hashing , 2015, Signal Process..

[32]  Jianmin Wang,et al.  Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[33]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[34]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[35]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[37]  Ling Shao,et al.  Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[39]  Chao Li,et al.  Shared Predictive Cross-Modal Deep Quantization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[42]  Xianglong Liu,et al.  Structure Sensitive Hashing With Adaptive Product Quantization , 2016, IEEE Transactions on Cybernetics.

[43]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Jianfei Cai,et al.  Semi-supervised manifold-embedded hashing with joint feature representation and classifier learning , 2017, Pattern Recognit..

[45]  Jiwen Lu,et al.  Deep Coupled Metric Learning for Cross-Modal Matching , 2017, IEEE Transactions on Multimedia.

[46]  Jingdong Wang,et al.  Collaborative Quantization for Cross-Modal Similarity Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Wei Liu,et al.  Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[48]  Victor Lempitsky,et al.  Additive Quantization for Extreme Vector Compression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[50]  Gang Hua,et al.  Supervised Matrix Factorization for Cross-Modality Hashing , 2016, IJCAI.

[51]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[52]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[53]  Xinbo Gao,et al.  Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[54]  Xianglong Liu,et al.  Distributed Adaptive Binary Quantization for Fast Nearest Neighbor Search , 2017, IEEE Transactions on Image Processing.

[55]  Ling Shao,et al.  Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval , 2017, IEEE Transactions on Image Processing.

[56]  Zhou Yu,et al.  Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[57]  Lei Huang,et al.  Multi-View Complementary Hash Tables for Nearest Neighbor Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Jungong Han,et al.  Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing , 2017, IEEE Transactions on Cybernetics.

[60]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[61]  Zi Huang,et al.  Discrete Multimodal Hashing With Canonical Views for Robust Mobile Landmark Search , 2017, IEEE Transactions on Multimedia.

[62]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[63]  Lianli Gao,et al.  Large-scale image retrieval with supervised sparse hashing , 2017, Neurocomputing.