Semi-Supervised Semantic-Preserving Hashing for Efficient Cross-Modal Retrieval

Cross-modal hashing has recently gained significant popularity to facilitate retrieval across different modalities. With limited label available, this paper presents a novel Semi-Supervised Semantic-Preserving Hashing (S3PH) for flexible cross-modal retrieval. In contrast to most semi-supervised cross-modal hashing works that need to predict the label of unlabeled data, our proposed approach groups the labeled and unlabeled data together, and integrates the relaxed latent subspace learning and semantic-preserving regularization across different modalities. Accordingly, an efficient relaxed objective function is proposed to learn the latent subspaces for both labeled and unlabeled data. Further, an orthogonal rotation matrix is efficiently learned to transform the latent subspace to hash space by minimizing the quantization error. Without sacrificing the retrieval performance, the proposed S3PH method can benefit various kinds of retrieval tasks, i.e., unsupervised, semi-supervised and supervised. Experimental results compared with several competitive algorithms show the effectiveness of the proposed method and its superiority over state-of-the-arts.

[1]  Wen Gao,et al.  Cross-pose face recognition based on partial least squares , 2011, Pattern Recognit. Lett..

[2]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[3]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[4]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[5]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Bin Zhang,et al.  Semi-supervised modality-dependent cross-media retrieval , 2018, Multimedia Tools and Applications.

[7]  Rongrong Ji,et al.  Cross-Modality Binary Code Learning via Fusion Similarity Hashing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Susan T. Dumais,et al.  Richard Harshman Indexing by Latent Semantic Analysis , 1990 .

[10]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[11]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[12]  Qi Tian,et al.  Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval , 2018, IEEE Transactions on Multimedia.

[13]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[14]  Devraj Mandal,et al.  Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[16]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[17]  Hefei Ling,et al.  Semi-supervised cross-modal learning for cross modal retrieval and image annotation , 2018, World Wide Web.

[18]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.