Latent semantic sparse hashing for cross-modal similarity search

Similarity search methods based on hashing for effective and efficient cross-modal retrieval on large-scale multimedia databases with massive text and images have attracted considerable attention. The core problem of cross-modal hashing is how to effectively construct correlation between multi-modal representations which are heterogeneous intrinsically in the process of hash function learning. Analogous to Canonical Correlation Analysis (CCA), most existing cross-modal hash methods embed the heterogeneous data into a joint abstraction space by linear projections. However, these methods fail to bridge the semantic gap more effectively, and capture high-level latent semantic information which has been proved that it can lead to better performance for image retrieval. To address these challenges, in this paper, we propose a novel Latent Semantic Sparse Hashing (LSSH) to perform cross-modal similarity search by employing Sparse Coding and Matrix Factorization. In particular, LSSH uses Sparse Coding to capture the salient structures of images, and Matrix Factorization to learn the latent concepts from text. Then the learned latent semantic features are mapped to a joint abstraction space. Moreover, an iterative strategy is applied to derive optimal solutions efficiently, and it helps LSSH to explore the correlation between multi-modal representations efficiently and automatically. Finally, the unified hashcodes are generated through the high level abstraction space by quantization. Extensive experiments on three different datasets highlight the advantage of our method under cross-modal scenarios and show that LSSH significantly outperforms several state-of-the-art methods.

[1]  Seungjin Choi,et al.  Sequential Spectral Learning to Hash with Multiple Representations , 2012, ECCV.

[2]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[3]  Zhiwu Lu,et al.  Latent Semantic Learning by Efficient Sparse Coding with Hypergraph Regularization , 2011, AAAI.

[4]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[5]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[6]  Wu-Jun Li,et al.  Double-Bit Quantization for Hashing , 2012, AAAI.

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[9]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[10]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[14]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[16]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[17]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[18]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[19]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[20]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  Jian Sun,et al.  K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[24]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[26]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[27]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[28]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[30]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[31]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[32]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[33]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[34]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[35]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[36]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.