Album Story 1 Description for Images in Isolation & in Sequences Re-telling Story 1 Caption in Sequence Storytelling Story 2 Story 3 Re-telling Preferred Photo Sequence Story 4 Story
暂无分享,去创建一个
Ross B. Girshick | Ting-Hao 'Kenneth' Huang | Jacob Devlin | Pushmeet Kohli | C. L. Zitnick | Dhruv Batra | Devi Parikh | Francis Ferraro | N. Mostafazadeh | Ishan Misra | Aishwarya Agrawal | Lucy Vanderwende | Michel Galley | Margaret Mitchell | Xiaodong He
[1] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[2] Francis Ferraro,et al. A Survey of Current Datasets for Vision and Language Research , 2015, EMNLP.
[3] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[4] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[5] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[6] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.
[7] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[8] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.
[9] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[10] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[13] P. Wiessner. Embers of society: Firelight talk among the Ju/’hoansi Bushmen , 2014, Proceedings of the National Academy of Sciences.
[14] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[15] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[16] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[17] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[18] Fei-Fei Li,et al. Video Event Understanding Using Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[19] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[20] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.
[21] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.
[22] Yejin Choi,et al. Déjà Image-Captions: A Corpus of Expressive Descriptions in Repetition , 2015, NAACL.
[23] Li Fei-Fei,et al. Deep visual-semantic alignments for generating image descriptions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).