Automatic video description generation via LSTM with joint two-stream encoding
暂无分享,去创建一个
[1] Lior Wolf,et al. Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation , 2014, ArXiv.
[2] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[4] Rada Mihalcea,et al. Text Mining for Automatic Image Tagging , 2010, COLING.
[5] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[6] Bernt Schiele,et al. The Long-Short Story of Movie Description , 2015, GCPR.
[7] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[8] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[9] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[10] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[13] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Yorick Wilks,et al. NLP for Indexing and Retrieval of Captioned Photographs , 2003, EACL.
[16] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[17] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[18] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[19] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[20] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Liang Lin,et al. I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.