DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization
暂无分享,去创建一个
Long Chen | Jun Xiao | Chujie Lu | Xiaolin Li | Chilie Tan | Jun Xiao | Xiaolin Li | Long Chen | Chujie Lu | C. Tan
[1] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Jiebo Luo,et al. Localizing Natural Language in Videos , 2019, AAAI.
[3] Yahong Han,et al. Multi-modal Circulant Fusion for Video-to-Language and Backward , 2018, IJCAI.
[4] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[5] Xiao Liu,et al. Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos , 2019, AAAI.
[6] Richard Socher,et al. DCN+: Mixed Objective and Deep Residual Coattention for Question Answering , 2017, ICLR.
[7] Hei Law,et al. CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.
[8] Yu-Gang Jiang,et al. Semantic Proposal for Activity Localization in Videos via Sentence Query , 2019, AAAI.
[9] Jiebo Luo,et al. Exploiting Temporal Relationships in Video Moment Localization with Natural Language , 2019, ACM Multimedia.
[10] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[11] Xingyi Zhou,et al. Objects as Points , 2019, ArXiv.
[12] Qi Tian,et al. CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Xingyi Zhou,et al. Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Ramakant Nevatia,et al. MAC: Mining Activity Concepts for Language-Based Temporal Localization , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[15] Qi Tian,et al. Cross-modal Moment Localization in Videos , 2018, ACM Multimedia.
[16] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[17] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.
[18] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[21] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[23] Shih-Fu Chang,et al. Online Detection of Action Start in Untrimmed, Streaming Videos , 2018, ECCV.
[24] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.
[26] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[28] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[29] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[30] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[31] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.
[34] Trevor Darrell,et al. Localizing Moments in Video with Temporal Language , 2018, EMNLP.
[35] Long Chen,et al. Counterfactual Critic Multi-Agent Training for Scene Graph Generation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Long Chen,et al. Video Question Answering via Attribute-Augmented Attention Network Learning , 2017, SIGIR.
[37] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Liang Wang,et al. Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).