Sketch, Ground, and Refine: Top-Down Dense Video Captioning
暂无分享,去创建一个
Yuan He | Shizhe Chen | Qi Wu | Chaorui Deng | Da Chen | Qi Wu | Yuan He | Shizhe Chen | Chaorui Deng | Da Chen
[1] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[2] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Wei Liu,et al. Reconstruction Network for Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Tao Mei,et al. Jointly Localizing and Describing Events for Dense Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[7] Masaaki Nagata,et al. SODA: Story Oriented Dense Video Captioning Evaluation Framework , 2020, ECCV.
[8] Yitian Yuan,et al. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Tao Mei,et al. Video Captioning with Transferred Semantic Attributes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Shilei Wen,et al. BMN: Boundary-Matching Network for Temporal Action Proposal Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Heng Tao Shen,et al. Deliberate Attention Networks for Image Captioning , 2019, AAAI.
[16] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Luowei Zhou,et al. End-to-End Dense Video Captioning with Masked Transformer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Yu Qiao,et al. Find and Focus: Retrieve and Localize Video Events with Natural Language Queries , 2018, ECCV.
[21] Bohyung Han,et al. Streamlined Dense Video Captioning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Shih-Fu Chang,et al. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Bohyung Han,et al. Local-Global Video-Text Interactions for Temporal Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Wei Li,et al. CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016 , 2016, ArXiv.
[26] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[27] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[28] Tieniu Tan,et al. M3: Multimodal Memory Modelling for Video Captioning , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[30] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[31] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[32] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Mohit Bansal,et al. MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning , 2020, ACL.
[35] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Bo Dai,et al. Move Forward and Tell: A Progressive Generator of Video Descriptions , 2018, ECCV.
[37] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Ming Yang,et al. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation , 2018, ECCV.
[39] Wei Liu,et al. Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[41] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[42] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[43] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.