Watch It Twice: Video Captioning with a Refocused Video Encoder
暂无分享,去创建一个
Shafiq R. Joty | Shafiq Joty | Jianfei Cai | Jiuxiang Gu | Xiangxi Shi | Jianfei Cai | Jiuxiang Gu | Xiangxi Shi
[1] Gang Wang,et al. Unpaired Image Captioning by Language Pivoting , 2018, ECCV.
[2] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[3] Rita Cucchiara,et al. Hierarchical Boundary-Aware Neural Encoder for Video Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[5] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[6] Tieniu Tan,et al. M3: Multimodal Memory Modelling for Video Captioning , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Liang Lin,et al. Interpretable Video Captioning via Trajectory Structured Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Gang Wang,et al. Stack-Captioning: Coarse-to-Fine Learning for Image Captioning , 2017, AAAI.
[11] Xin Wang,et al. Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning , 2018, NAACL.
[12] Shin'ichi Satoh,et al. Consensus-based Sequence Training for Video Captioning , 2017, ArXiv.
[13] Qingming Huang,et al. Less Is More: Picking Informative Frames for Video Captioning , 2018, ECCV.
[14] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Zhou Su,et al. Weakly Supervised Dense Video Captioning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[18] Tao Mei,et al. Exploring Visual Relationship for Image Captioning , 2018, ECCV.
[19] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[20] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Jianfei Cai,et al. Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction , 2018, Neurocomputing.
[22] Gang Wang,et al. Unpaired Image Captioning via Scene Graph Alignments , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Kate Saenko,et al. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge , 2013, AAAI.
[24] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[25] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[26] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Xuelong Li,et al. Video Captioning with Tube Features , 2018, IJCAI.
[28] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[30] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[32] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[33] Xin Wang,et al. Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[35] Ramakanth Pasunuru,et al. Reinforced Video Captioning with Entailment Rewards , 2017, EMNLP.
[36] Chuang Gan,et al. Video Captioning with Multi-Faceted Attention , 2016, TACL.
[37] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[39] Yongkang Wong,et al. A Fine-Grained Spatial-Temporal Attention Model for Video Captioning , 2018, IEEE Access.
[40] Sheng Liu,et al. SibNet: Sibling Convolutional Encoder for Video Captioning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[41] Boqing Gong,et al. End-to-End Video Captioning With Multitask Reinforcement Learning , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[42] Wei Liu,et al. Reconstruction Network for Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[43] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Yang Yang,et al. Bidirectional Long-Short Term Memory for Video Description , 2016, ACM Multimedia.
[45] Jianfei Cai,et al. Scene Graph Generation With External Knowledge and Image Reconstruction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[47] Heng Tao Shen,et al. Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning , 2017, IJCAI.