暂无分享,去创建一个
Wei Wang | Tieniu Tan | Junbo Wang | Liang Wang | Yan Huang
[1] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[2] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[4] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[5] W. Marsden. I and J , 2012 .
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[8] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[12] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[14] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[15] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[17] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[18] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.
[20] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[21] Phil Blunsom,et al. Learning to Transduce with Unbounded Memory , 2015, NIPS.
[22] Rita Cucchiara,et al. Hierarchical Boundary-Aware Neural Encoder for Video Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[25] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[27] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Kate Saenko,et al. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge , 2013, AAAI.
[29] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[30] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[31] Yueming Lu,et al. A Multi-modal Clustering Method for Web Videos , 2012, ISCTCS.
[32] Kumar Krishna Agrawal,et al. Recurrent Memory Addressing for Describing Videos , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[33] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[34] Yuan Yao,et al. Simulating human saccadic scanpaths on natural images , 2011, CVPR 2011.
[35] Jason Weston,et al. Memory Networks , 2014, ICLR.
[36] Yaswanth Kumar Avulapati,et al. MULTIMODAL FUSION FOR VIDEO SEARCH RERANKING , 2013 .
[37] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[38] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[39] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[40] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.
[41] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[43] Pushmeet Kohli,et al. Memory-augmented Attention Modelling for Videos , 2016, ArXiv.
[44] Xiang Zhang,et al. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems , 2015, ICLR.
[45] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[46] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[47] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Jason Weston,et al. Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.
[49] Tomas Mikolov,et al. Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.