暂无分享,去创建一个
Xinlei Chen | Saurabh Gupta | C. Lawrence Zitnick | Piotr Dollár | Hao Fang | Tsung-Yi Lin | Ramakrishna Vedantam | Piotr Dollár | Xinlei Chen | Tsung-Yi Lin | C. L. Zitnick | Saurabh Gupta | Hao Fang | Ramakrishna Vedantam
[1] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] David A. Forsyth,et al. Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[5] R. Manmatha,et al. A Model for Learning the Semantics of Pictures , 2003, NIPS.
[6] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[7] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[8] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[9] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[12] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[13] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.
[14] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[15] C. V. Jawahar,et al. Choosing Linguistics over Vision to Describe Images , 2012, AAAI.
[16] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[17] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[18] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[21] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[22] Angeliki Lazaridou,et al. Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world , 2014, ACL.
[23] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[24] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[25] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[26] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[27] Frank Keller,et al. Comparing Automatic Evaluation Measures for Image Description , 2014, ACL.
[28] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.
[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[30] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[31] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[32] Lorenzo Torresani,et al. AutoCaption: Automatic caption generation for personal photos , 2014, IEEE Winter Conference on Applications of Computer Vision.
[33] Eugene Charniak,et al. Nonparametric Method for Data-driven Image Captioning , 2014, ACL.
[34] Priyanka Jadhav,et al. Automatic Caption Generation for News Images , 2014 .
[35] Xinlei Chen,et al. Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.
[36] Svetlana Lazebnik,et al. Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.
[37] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[38] Ronan Collobert,et al. Simple Image Description Generator via a Linear Phrase-Based Approach , 2014, ICLR.
[39] Ronan Collobert,et al. Phrase-based Image Captioning , 2015, ICML.
[40] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Yejin Choi,et al. Déjà Image-Captions: A Corpus of Expressive Descriptions in Repetition , 2015, NAACL.
[43] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Angeliki Lazaridou,et al. Combining Language and Vision with a Multimodal Skip-gram Model , 2015, NAACL.
[45] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[46] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).