Semi-supervised semantic factorization hashing for fast cross-modal retrieval

Cross-modal hashing can effectively solve the large-scale cross-modal retrieval by integrating the advantages of traditional cross-modal analysis and hashing techniques. In cross-modal hashing, preserving semantic correlation is important and challenging. However, current hashing methods cannot well preserve the semantic correlation in hash codes. Supervised hashing requires labeled data which is difficult to obtain, and unsupervised hashing cannot effectively learn semantic correlation from multi-modal data. In order to effectively learn semantic correlation to improve hashing performance, we propose a novel approach: Semi-Supervised Semantic Factorization Hashing (S3FH), for large-scale cross-modal retrieval. The main purpose of S3FH is to improve semantic labels and factorize it into hash codes. It optimizes a joint framework which consists of three interactive parts, including semantic factorization, multi-graph learning and multi-modal correlation. Then, an efficient alternating algorithm is derived for optimizing S3FH. Extensive experiments on two real world multi-modal datasets demonstrate the effectiveness of S3FH.

[1]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[3]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[4]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[6]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[7]  Wu-Jun Li,et al.  Isotropic Hashing , 2012, NIPS.

[8]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[9]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[10]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[11]  Chong-Wah Ngo,et al.  Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search , 2015, SIGIR.

[12]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[15]  Hai Jin,et al.  Content-Based Visual Landmark Search via Multimodal Hypergraph Learning , 2015, IEEE Transactions on Cybernetics.

[16]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[17]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Hongxun Yao,et al.  Affective Image Retrieval via Multi-Graph Learning , 2014, ACM Multimedia.

[19]  Wilfred Ng,et al.  Locality-sensitive hashing scheme based on dynamic collision counting , 2012, SIGMOD Conference.

[20]  Bo Geng,et al.  Manifold Regularized Multi-task Learning for Semi-supervised Multi-label Image Classification , 2013 .

[21]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[22]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[23]  Lei Zhu,et al.  Cross-Modal Self-Taught Hashing for large-scale image retrieval , 2016, Signal Process..

[24]  Shiguang Shan,et al.  Semisupervised Hashing via Kernel Hyperplane Learning for Scalable Image Search , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[26]  Katta G. Murty,et al.  Linear complementarity, linear and nonlinear programming , 1988 .

[27]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[28]  Hanqing Lu,et al.  Semi-supervised multi-graph hashing for scalable similarity search , 2014, Comput. Vis. Image Underst..

[29]  Xiaohua Zhai,et al.  Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval , 2013, AAAI.

[30]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[31]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[33]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.