Online Cross-Modal Hashing for Web Image Retrieval

Cross-modal hashing (CMH) is an efficient technique for the fast retrieval of web image data, and it has gained a lot of attentions recently. However, traditional CMH methods usually apply batch learning for generating hash functions and codes. They are inefficient for the retrieval of web images which usually have streaming fashion. Online learning can be exploited for CMH. But existing online hashing methods still cannot solve two essential problems: efficient updating of hash codes and analysis of cross-modal correlation. In this paper, we propose Online Cross-modal Hashing (OCMH) which can effectively address the above two problems by learning the shared latent codes (SLC). In OCMH, hash codes can be represented by the permanent SLC and dynamic transfer matrix. Therefore, inefficient updating of hash codes is transformed to the efficient updating of SLC and transfer matrix, and the time complexity is irrelevant to the database size. Moreover, SLC is shared by all the modalities, and thus it can encode the latent cross-modal correlation, which further improves the overall cross-modal correlation between heterogeneous data. Experimental results on two real-world multi-modal web image datasets: MIR Flickr and NUS-WIDE, demonstrate the effectiveness and efficiency of OCMH for online cross-modal web image retrieval.

[1]  Hai Jin,et al.  Content-Based Visual Landmark Search via Multimodal Hypergraph Learning , 2015, IEEE Transactions on Cybernetics.

[2]  Lei Zhu,et al.  Topic Hypergraph Hashing for Mobile Image Retrieval , 2015, ACM Multimedia.

[3]  Yansheng Lu,et al.  Analyzing semantic correlation for cross-modal retrieval , 2014, Multimedia Systems.

[4]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[5]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[6]  Lei Zhu,et al.  Cross-Modal Self-Taught Hashing for large-scale image retrieval , 2016, Signal Process..

[7]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[8]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Xiaohua Zhai,et al.  Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval , 2013, AAAI.

[10]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[11]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[12]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[13]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[14]  Hanqing Lu,et al.  Online sketching hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[16]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[18]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[19]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[20]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[21]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[22]  Wei-Shi Zheng,et al.  Online Hashing , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Roger Levy,et al.  On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[25]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.