Attention-based Visual-Audio Fusion for Video Caption Generation
暂无分享,去创建一个
[1] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[4] Jia Chen,et al. Describing Videos using Multi-modal Fusion , 2016, ACM Multimedia.
[5] Zhaoxiang Zhang,et al. Integrating both Visual and Audio Cues for Enhanced Video Caption , 2017, AAAI.
[6] Christopher Joseph Pal,et al. Video Description Generation Incorporating Spatio-Temporal Features and a Soft-Attention Mechanism , 2015, ArXiv.
[7] Dumitru Erhan,et al. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[9] Di Guo,et al. Cross-Modal Zero-Shot-Learning for Tactile Object Recognition , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[10] D. Sosin,et al. An outbreak of cryptosporidiosis from fresh-pressed apple cider. , 1994, JAMA.
[11] Hermann Ney,et al. Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[12] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[13] Laura Custer,et al. Cordon-Bleu Is an Actin Nucleation Factor and Controls Neuronal Morphology , 2007, Cell.
[14] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.
[15] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[16] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[17] Xu Jia,et al. Guiding Long-Short Term Memory for Image Caption Generation , 2015, ArXiv.
[18] Alon Lavie,et al. The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.
[19] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).