Semantic-rebased cross-modal hashing for scalable unsupervised text-visual retrieval

Abstract Recently, learning-based cross-modal hashing has gained increasing research interests for its low computation complexity and memory requirement. Among existing cross-modal techniques, supervised algorithms can gain better performance. However, due to the cost of acquiring labeled data, unsupervised methods become our choice when faced with large scale unlabeled web images. The label-free nature of unsupervised cross-modal hashing hinders models from exploiting the exact semantic data similarity. Existing research typically simulates the semantics by a heuristic geometric prior in the original feature space with pseudo labels or traditional dense graph structures. However, this introduces heavy bias into the model as the original features are not fully representing the underlying multi-view data relations, and these two structures may face with issues like interference noise or high sensitivity to cluster number. To address the problem above, in this paper, we propose a novel unsupervised sparse-graph based hashing method called Semantic-Rebased Cross-modal Hashing (SRCH). A novel ‘Set-and-Rebase’ process is defined to initialize and update the cross-modal similarity graph of training data. In particular, we set the graph according to the intra-modal feature geometric basis and then alternately rebase it to update the edges within according to the hashing results. We develop an alternating optimization routine to rebase the graph and train the hashing auto-encoders with closed-form solutions so that the overall framework is efficiently trained. Our experimental results on benchmarked datasets demonstrate the superiority of our model against state-of-the-art algorithms.

[1]  Kang Chen,et al.  Uncertainty-optimized deep learning model for small-scale person re-identification , 2019, Science China Information Sciences.

[2]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[3]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[4]  Vladlen Koltun,et al.  Robust continuous clustering , 2017, Proceedings of the National Academy of Sciences.

[5]  Tieniu Tan,et al.  Deep Supervised Discrete Hashing , 2017, NIPS.

[6]  Wei Liu,et al.  Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiwen Lu,et al.  Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jingdong Wang,et al.  Collaborative Quantization for Cross-Modal Similarity Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Cheng Deng,et al.  Unsupervised Deep Generative Adversarial Hashing Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Ling Shao,et al.  Discriminative Fisher Embedding Dictionary Learning Algorithm for Object Recognition , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Ling Shao,et al.  Unsupervised Deep Hashing With Pseudo Labels for Scalable Image Retrieval , 2018, IEEE Transactions on Image Processing.

[13]  Wu-Jun Li,et al.  Feature Learning Based Deep Supervised Hashing with Pairwise Labels , 2015, IJCAI.

[14]  Xuelong Li,et al.  Large Graph Hashing with Spectral Rotation , 2017, AAAI.

[15]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ling Shao,et al.  Binary Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xianglong Liu,et al.  Graph Convolutional Network Hashing for Cross-Modal Retrieval , 2019, IJCAI.

[18]  Wei Liu,et al.  Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[21]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[22]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[23]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[24]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[25]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[26]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[27]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[28]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[29]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[30]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[31]  Ngai-Man Cheung,et al.  Learning to Hash with Binary Deep Neural Network , 2016, ECCV.

[32]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Hiroyuki Arai,et al.  Alternating Co-Quantization for Cross-Modal Hashing , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Zi Huang,et al.  Scalable Supervised Asymmetric Hashing With Semantic and Latent Factor Embedding , 2019, IEEE Transactions on Image Processing.

[35]  Jun Wu,et al.  Deep Fusion Feature Representation Learning With Hard Mining Center-Triplet Loss for Person Re-Identification , 2020, IEEE Transactions on Multimedia.

[36]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Wei Liu,et al.  Semantic Structure-based Unsupervised Deep Hashing , 2018, IJCAI.

[38]  Wai Keung Wong,et al.  Deep Supervised Hashing With Anchor Graph , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Chao Li,et al.  Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval , 2019, AAAI.

[40]  Lin Yang,et al.  Kernel-Based Supervised Discrete Hashing for Image Retrieval , 2016, ECCV.

[41]  Lin Yang,et al.  Asymmetric Discrete Graph Hashing , 2017, AAAI.

[42]  Ling Shao,et al.  Supervised Matrix Factorization Hashing for Cross-Modal Retrieval , 2016, IEEE Transactions on Image Processing.

[43]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[44]  Xiang Zhu,et al.  Supervised deep hashing for scalable face image retrieval , 2018, Pattern Recognit..

[45]  Jiancheng Lv,et al.  COMIC: Multi-view Clustering Without Parameter Selection , 2019, ICML.

[46]  Yuxin Peng,et al.  Unsupervised Generative Adversarial Cross-modal Hashing , 2017, AAAI.

[47]  Jia Wang,et al.  Unsupervised Triplet Hashing for Fast Image Retrieval , 2017, ACM Multimedia.

[48]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[49]  Jiwen Lu,et al.  Cross-Modal Deep Variational Hashing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[51]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Baoxin Li,et al.  Weakly Supervised Deep Image Hashing Through Tag Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[54]  Wei Wang,et al.  Multilevel triplet deep learning model for person re-identification , 2019, Pattern Recognit. Lett..

[55]  Ling Shao,et al.  Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[56]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[57]  Xuelong Li,et al.  Discrete Spectral Hashing for Efficient Similarity Retrieval , 2019, IEEE Transactions on Image Processing.

[58]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[59]  Jinhui Tang,et al.  Semantic Neighbor Graph Hashing for Multimodal Retrieval , 2018, IEEE Transactions on Image Processing.