Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
暂无分享,去创建一个
Kate Saenko | Raymond J. Mooney | Subhashini Venugopalan | Lisa Anne Hendricks | Subhashini Venugopalan | R. Mooney | Kate Saenko
[1] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[2] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.
[3] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.
[4] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[5] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[6] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[7] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[8] Xu Wei,et al. Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[9] Trevor Darrell,et al. Captioning Images with Diverse Objects , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.
[12] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[13] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[14] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[15] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[18] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[19] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[20] Oleksandr Makeyev,et al. Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).
[21] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[22] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[24] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[25] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[26] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[29] Kate Saenko,et al. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge , 2013, AAAI.
[30] Cordelia Schmid,et al. Label-Embedding for Attribute-Based Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[31] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[32] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[33] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[34] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[35] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[36] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[37] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.