Neighborhood-Preserving Hashing for Large-Scale Cross-Modal Search

In the literature of cross-modal search, most methods employ linear models to pursue hash codes that preserve data similarity, in terms of Euclidean distance, both within-modal and across-modal. However, data dimensionality can be quite different across modalities. It is known that the behavior of Euclidean distance/similarity between datapoints can be drastically different in linear spaces of different dimensionality. In this paper, we identify this "variation of dimensionality" problem in cross-modal search that may harm most of distance-based methods. We propose a semi-supervised nonlinear probabilistic cross-modal hashing method, namely Neighborhood-Preserving Hashing (NPH), to alleviate the negative effect due to the variation of dimensionality issue. Inspired by tSNE \cite{tSNE_van2008visualizing}, rather than preserve pairwise data distances, we propose to learn hash codes that preserve neighborhood relationship of datapoints via matching their conditional distribution derived from distance to that of datapoints of multi-modalities. Experimental results on three real-world datasets demonstrate that the proposed method outperforms the state-of-the-art distance-based semi-supervised cross-modal hashing methods as well as many fully-supervised ones.

[1]  Hongyuan Zha,et al.  Cross-Modal Similarity Learning via Pairs, Preferences, and Active Supervision , 2015, AAAI.

[2]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[3]  Ming-Hsuan Yang,et al.  Locality preserving hashing , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[6]  Wenwu Zhu,et al.  Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.

[7]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[8]  E. Süli,et al.  Numerical Solution of Partial Differential Equations , 2014 .

[9]  Xinbo Gao,et al.  Semantic Topic Multimodal Hashing for Cross-Media Retrieval , 2015, IJCAI.

[10]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[11]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[12]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[13]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[14]  Luo Si,et al.  Learning to Hash on Partial Multi-Modal Data , 2015, IJCAI.

[15]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[16]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[17]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[18]  Zhou Yu,et al.  Sparse Multi-Modal Hashing , 2014, IEEE Transactions on Multimedia.

[19]  G. Hedstrom,et al.  Numerical Solution of Partial Differential Equations , 1966 .

[20]  Wen Gao,et al.  Parametric Local Multimodal Hashing for Cross-View Similarity Search , 2013, IJCAI.

[21]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[22]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[23]  Yongdong Zhang,et al.  Full-Space Local Topology Extraction for Cross-Modal Retrieval , 2015, IEEE Transactions on Image Processing.

[24]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[27]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Christoph H. Lampert,et al.  Learning Multi-View Neighborhood Preserving Projections , 2011, ICML.

[29]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[30]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Seungjin Choi,et al.  Multi-view anchor graph hashing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[33]  Jürgen Schmidhuber,et al.  Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Zhou Yu,et al.  Cross-Media Hashing with Neural Networks , 2014, ACM Multimedia.

[36]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.