Memory-Guided Semantic Learning Network for Temporal Sentence Grounding
暂无分享,去创建一个
Xiaoye Qu | Xing Di | Yu Cheng | Pan Zhou | Daizong Liu | Zichuan Xu | Zichuan Xu | Daizong Liu | Xiaoye Qu | Pan Zhou | Xing Di | Yu Cheng
[1] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[2] Yale Song,et al. Video co-summarization: Video summarization by visual co-occurrence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Qi Tian,et al. Cross-modal Moment Localization in Videos , 2018, ACM Multimedia.
[4] Yu Cheng,et al. Fine-grained Iterative Attention Network for Temporal Language Localization in Videos , 2020, ACM Multimedia.
[5] Jian Shao,et al. Boundary Proposal Network for Two-Stage Natural Language Video Localization , 2021, AAAI.
[6] Wei Liu,et al. Recurrent Fusion Network for Image Captioning , 2018, ECCV.
[7] Hao Zhang,et al. Span-based Localizing Network for Natural Language Video Localization , 2020, ACL.
[8] Jingwen Wang,et al. Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction , 2020, AAAI.
[9] Pan Zhou,et al. Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network , 2020, COLING.
[10] Xiao-Yang Liu,et al. Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization , 2020, ACM Multimedia.
[11] Yoshua Bengio,et al. An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.
[12] Yale Song,et al. TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Liang Wang,et al. Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[15] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[16] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.
[18] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[20] Xiaoye Qu,et al. Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos , 2021, EMNLP.
[21] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[22] Zhou Zhao,et al. Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos , 2019, SIGIR.
[23] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.
[24] Hongdong Li,et al. Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[25] Ramakant Nevatia,et al. MAC: Mining Activity Concepts for Language-Based Temporal Localization , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[26] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[28] Yu-Gang Jiang,et al. Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos , 2020, ECCV.
[29] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Shih-Fu Chang,et al. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Chunhua Shen,et al. Visual Question Answering with Memory-Augmented Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[32] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[33] Mike Zheng Shou,et al. On Pursuit of Designing Multi-modal Transformer for Video Grounding , 2021, EMNLP.
[34] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.
[35] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[37] Xiaoye Qu,et al. Exploring Motion and Appearance Information for Temporal Sentence Grounding , 2022, AAAI.
[38] Xiaoye Qu,et al. Unsupervised Temporal Video Grounding with Deep Semantic Clustering , 2022, AAAI.
[39] Yitian Yuan,et al. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[41] Long Chen,et al. Rethinking the Bottom-Up Framework for Query-Based Video Localization , 2020, AAAI.
[42] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[43] Rui Qiao,et al. Interventional Video Grounding with Dual Contrastive Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[45] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Xiaoye Qu,et al. Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding , 2021, EMNLP.
[47] Bohyung Han,et al. Local-Global Video-Text Interactions for Temporal Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Jiebo Luo,et al. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language , 2019, AAAI.
[49] Yan Huang,et al. ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Jianfeng Dong,et al. Context-aware Biaffine Localizing Network for Temporal Sentence Grounding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).