Semantic Topic Multimodal Hashing for Cross-Media Retrieval

Multimodal hashing is essential to cross-media similarity search for its low storage cost and fast query speed. Most existing multimodal hashing methods embedded heterogeneous data into a common low-dimensional Hamming space, and then rounded the continuous embeddings to obtain the binary codes. Yet they usually neglect the inherent discrete nature of hashing for relaxing the discrete constraints, which will cause degraded retrieval performance especially for long codes. For this purpose, a novel Semantic Topic Multimodal Hashing (STMH) is developed by considering latent semantic information in coding procedure. It first discovers clustering patterns of texts and robust factorizes the matrix of images to obtain multiple semantic topics of texts and concepts of images. Then the learned multimodal semantic features are transformed into a common subspace by their correlations. Finally, each bit of unified hash code can be generated directly by figuring out whether a topic or concept is contained in a text or an image. Therefore, the obtained model by STMH is more suitable for hashing scheme as it directly learns discrete hash codes in the coding process. Experimental results demonstrate that the proposed method outperforms several state-of-the-art methods.

[1]  Chris H. Q. Ding,et al.  R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[2]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[3]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[4]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[5]  Dacheng Tao,et al.  GoDec: Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case , 2011, ICML.

[6]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[7]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[8]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  HeXiaofei,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2011 .

[10]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[11]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[12]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[13]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Punam Bedi,et al.  Trust Based Recommender System for Semantic Web , 2007, IJCAI.

[15]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[17]  Chris H. Q. Ding,et al.  Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[18]  Wen Gao,et al.  Parametric Local Multimodal Hashing for Cross-View Similarity Search , 2013, IJCAI.

[19]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[20]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.