论文信息 - A survey of evolution of image captioning techniques - 字舞流文

A survey of evolution of image captioning techniques

Akshi Kumar | Shivali Goel | Akshi Kumar | Shivali Goel

[1] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[2] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[3] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4] Yansong Feng,et al. Automatic Caption Generation for News Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Paul Clough,et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems , 2006 .

[6] C. V. Jawahar,et al. Choosing Linguistics over Vision to Describe Images , 2012, AAAI.

[7] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[8] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[9] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[10] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[11] Anja Belz,et al. An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems , 2009, CL.

[12] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[14] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[15] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Fei-Fei Li,et al. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[17] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.

[18] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.

[19] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[20] David A. Forsyth,et al. Describing objects by their attributes , 2009, CVPR.

[21] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.

[24] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[26] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[27] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.

[29] Dumitru Erhan,et al. Scalable, High-Quality Object Detection , 2014, ArXiv.

[30] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[31] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.

[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[35] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.

[36] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.

[37] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[38] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[40] Liang Lin,et al. I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.

[41] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..

[43] Yejin Choi,et al. TreeTalk: Composition and Compression of Trees for Image Descriptions , 2014, TACL.

[44] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.