论文信息 - Multimodal News Article Analysis

Multimodal News Article Analysis

The intersection of Computer Vision and Natural Language Processing has been a hot topic of research in recent years, with results that were unthinkable only a few years ago. In view of this progress, we want to highlight online news articles as a potential next step for this area of research. The rich interrelations of text, tags, images or videos, as well as a vast corpus of general knowledge are an exciting benchmark for high-capacity models such as the deep neural networks. In this paper we present a series of tasks and baseline approaches to leverage corpus such as the BreakingNews dataset.

Arnau Ramisa | Arnau Ramisa

[1] Jiebo Luo,et al. Geo-location inference on news articles via multimodal pLSA , 2012, ACM Multimedia.

[2] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Yansong Feng,et al. Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[5] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[6] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Ilya Kostrikov,et al. PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[8] Xiaoou Tang,et al. Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[9] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[10] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Nuno Vasconcelos,et al. Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[12] Francesc Moreno-Noguer,et al. BreakingNews: Article Annotation by Image and Text Processing , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Adriana Kovashka,et al. WhittleSearch: Interactive Image Search with Relative Attribute Feedback , 2015, International Journal of Computer Vision.

[14] Angel X. Chang,et al. Interactive Learning of Spatial Knowledge for Text to 3D Scene Generation , 2014 .

[15] Joemon M. Jose,et al. ACM International Conference on Multimedia Retrieval (ICMR 2014) , 2014, IEEE Multim..

[16] Alexei A. Efros,et al. Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[18] Ieee Xplore,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[22] Mirella Lapata,et al. Proceedings of ACL-08: HLT , 2008 .

[23] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[24] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.