论文信息 - Cross-Media Semantic Matching via Sparse Neural Network Pre-trained by Deep Restricted Boltzmann Machines

Cross-Media Semantic Matching via Sparse Neural Network Pre-trained by Deep Restricted Boltzmann Machines

Cross-media retrieval arouses considerable attentions and becomes a more and more worthwhile research direction in the domain of information retrieval. Different from many related works which perform retrieval by mapping heterogeneous data into a common representation subspace using a couple of projection matrices, we input multi-modal media data into a model of neural network which utilize a deep sparse neural network pre-trained by restricted Boltzmann machines and output their semantic understanding for semantic matching (RSNN-SM). Consequently, the heterogeneous modality data are represented by their top-level semantic outputs, and cross-media retrieval is performed by measuring their semantic similarities. Experimental results on several real-world datasets show that, RSNN-SM obtains the best performance and outperforms the state-of-the-art approaches.

[1] Honglak Lee,et al. Sparse deep belief net model for visual area V2 , 2007, NIPS.

[2] Roman Rosipal,et al. Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[3] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[4] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[5] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[6] Meng Wang,et al. Multimedia Question Answering , 2010, IEEE MultiMedia.

[7] Jing Li,et al. Video hashing based on appearance and attention features fusion via DBN , 2016, Neurocomputing.

[8] Meng Zhao,et al. An angle structure descriptor for image retrieval , 2016, China Communications.

[9] Yao Zhao,et al. Modality-Dependent Cross-Media Retrieval , 2015, ACM Trans. Intell. Syst. Technol..

[10] Wei Wang,et al. A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.

[11] David W. Jacobs,et al. Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Meng Zhao,et al. A novel image retrieval method based on multi-trend structure descriptor , 2016, J. Vis. Commun. Image Represent..

[13] Xiaoqin Wang,et al. Transfer Learning from Unlabeled Data via Neural Networks , 2012, Neural Processing Letters.

[14] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.