论文信息 - Discovering Semantic Vocabularies for Cross-Media Retrieval

Discovering Semantic Vocabularies for Cross-Media Retrieval

This paper proposes a data-driven approach for cross-media retrieval by automatically learning its underlying semantic vocabulary. Different from the existing semantic vocabularies, which are manually pre-defined and annotated, we automatically discover the vocabulary concepts and their annotations from multimedia collections. To this end, we apply a probabilistic topic model on the text available in the collection to extract its semantic structure. Moreover, we propose a learning to rank framework, to effectively learn the concept classifiers from the extracted annotations. We evaluate the discovered semantic vocabulary for cross-media retrieval on three datasets of image/text and video/text pairs. Our experiments demonstrate that the discovered vocabulary does not require any manual labeling to outperform three recent alternatives for cross-media retrieval.

[1] Yueting Zhuang,et al. Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval , 2014, ACM Multimedia.

[2] Roger Levy,et al. On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Cees Snoek,et al. VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[5] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.

[6] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7] Cees Snoek,et al. Video2Sentence and vice versa , 2013, MM '13.

[8] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[9] Ishwar K. Sethi,et al. Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[10] Zi Huang,et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[13] PerronninFlorent,et al. Good Practice in Large-Scale Learning for Image Classification , 2014 .

[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15] Guiguang Ding,et al. Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[16] Alexander C. Berg,et al. Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[17] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[19] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[20] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[21] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[22] R. Manmatha,et al. Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[23] Alberto Del Bimbo,et al. A Cross-media Model for Automatic Image Annotation , 2014, ICMR.

[24] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[25] Cees Snoek,et al. Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[26] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[27] Yueting Zhuang,et al. Cross-media semantic representation via bi-directional learning to rank , 2013, ACM Multimedia.

[28] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.