论文信息 - A novel cross-modal hashing algorithm based on multimodal deep learning

A novel cross-modal hashing algorithm based on multimodal deep learning

With the popularity of multi-modal data on Web, cross media retrieval has become a hot research topic. Existing cross modal hash methods assume that there is a latent space shared by multi-modal features, and embed the heterogeneous data into a joint abstraction space by linear projections. However, these approaches are sensitive to the noise of data, and unable to make use of unlabelled data and multi-modal data with missing values in the real-world applications. To address these challenges, in this paper, we propose a novel Multi-modal Deep Learning based Hashing (MDLH) algorithm. In particular, MDLH adopts deep neural network to encode heterogeneous features into a compact common representation and learn the hash functions based on the common representation. The parameters of the whole model are fine-tuned in supervised training stage. Experiments on two standard datasets show that our method achieves more effective results than other methods in cross modal retrieval.

[1] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[2] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[3] James A. Anderson,et al. Neurocomputing: Foundations of Research , 1988 .

[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[6] Raghavendra Udupa,et al. Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[7] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[8] Antonio Torralba,et al. Spectral Hashing , 2008, NIPS.

[9] Zi Huang,et al. Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[10] Seungjin Choi,et al. Deep Learning to Hash with Multiple Representations , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11] Yuanxi Li,et al. Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine , 2012, TIST.

[12] Wenwu Zhu,et al. Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.

[13] Zhou Yu,et al. Discriminative coupled dictionary hashing for fast cross-media retrieval , 2014, SIGIR.

[14] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15] Lei Zhang,et al. Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[16] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[17] Nikos Paragios,et al. Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[19] Yi Zhen,et al. A probabilistic model for multimodal hash function learning , 2012, KDD.

[20] Zi Huang,et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[21] Chao Chen,et al. Web media semantic concept retrieval via tag removal and model fusion , 2013, ACM Trans. Intell. Syst. Technol..

[22] Emile H. L. Aarts,et al. Boltzmann machines , 1998 .

[23] Chunyan Miao,et al. Online multimodal deep similarity learning with application to image retrieval , 2013, ACM Multimedia.

[24] Luo Si,et al. Learning to Hash on Partial Multi-Modal Data , 2015, IJCAI.

[25] Beng Chin Ooi,et al. Effective Multi-Modal Retrieval based on Stacked Auto-Encoders , 2014, Proc. VLDB Endow..

[26] Nicole Immorlica,et al. Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[27] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[28] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29] Guiguang Ding,et al. Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[30] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[31] Yizhou Wang,et al. Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[32] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[33] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[34] Yao Hu,et al. Iterative Multi-View Hashing for Cross Media Indexing , 2014, ACM Multimedia.

[35] Honglak Lee,et al. Improved Multimodal Deep Learning with Variation of Information , 2014, NIPS.

[36] Ju Liu,et al. Robust video hashing based on representative-dispersive frames , 2012, Science China Information Sciences.

[37] Jiwu Huang,et al. Perceptual video hashing robust against geometric distortions , 2011, Science China Information Sciences.