Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
暂无分享,去创建一个
Trevor Darrell | Marcus Rohrbach | Kate Saenko | Raymond J. Mooney | Subhashini Venugopalan | Lisa Anne Hendricks | Trevor Darrell | Subhashini Venugopalan | Marcus Rohrbach | R. Mooney | Kate Saenko
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[4] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[5] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[6] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[7] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[8] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[10] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Fei-Fei Li,et al. What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.
[12] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Cyrus Rashtchian,et al. Collecting Image Annotations Using Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.
[16] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[17] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.
[18] Bart Thomee,et al. Overview of the ImageCLEF 2012 Flickr Photo Annotation and Retrieval Task , 2012, CLEF.
[19] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[20] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.
[21] Mark J. Huiskes,et al. The MIR flickr retrieval evaluation , 2008, MIR '08.
[22] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[23] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[24] Xu Wei,et al. Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[25] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[26] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[28] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[29] Trevor Darrell,et al. LSDA: Large Scale Detection through Adaptation , 2014, NIPS.
[30] Kate Saenko,et al. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge , 2013, AAAI.
[31] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[33] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[34] Bernt Schiele,et al. The Long-Short Story of Movie Description , 2015, GCPR.
[35] Qi Wu,et al. Image Captioning with an Intermediate Attributes Layer , 2015, ArXiv.
[36] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.
[38] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[39] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[40] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.
[41] Chunhua Shen,et al. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).