End-to-end let's play commentary generation using multi-modal video representations
暂无分享,去创建一个
[1] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[2] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Subhransu Maji,et al. Deep convolutional filter banks for texture recognition and segmentation , 2014, ArXiv.
[4] Josef Kittler,et al. Generating commentaries for tennis videos , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[5] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[7] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[8] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Jorma Laaksonen,et al. Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation , 2016, ACM Multimedia.
[11] Zhigang Deng,et al. Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.
[12] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Nikos Pelekis,et al. DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis , 2017, *SEMEVAL.
[15] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.
[16] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[17] Matthew Guzdial,et al. Towards Automated Let's Play Commentary , 2018, AIIDE Workshops.
[18] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[19] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[20] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[21] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[22] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[23] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[24] E. B. Newman,et al. A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .
[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.