D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
暂无分享,去创建一个
Ruizhi Qiao | Xingwu Sun | Su He | Taian Guo | Xiujun Shu | Wei Wen | Bei Gan | Hanjun Li
[1] Ya Zhang,et al. Constraint and Union for Partially-Supervised Temporal Sentence Grounding , 2023, ArXiv.
[2] Yuechen Wang,et al. Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language Grounding , 2022, EMNLP.
[3] Yang Liu,et al. Weakly Supervised Video Moment Localization with Contrastive Negative Sample Mining , 2022, AAAI.
[4] Yuxin Peng,et al. Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Yu-Gang Jiang,et al. Video Moment Retrieval from Text Queries via Single Frame Annotation , 2022, SIGIR.
[6] C. Schmid,et al. TubeDETR: Spatio-Temporal Video Grounding with Transformers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ke Yan,et al. SIOD: Single Instance Annotated Per Category Per Image for Object Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Tat-Seng Chua,et al. Video Moment Retrieval With Cross-Modal Neural Architecture Search , 2022, IEEE Transactions on Image Processing.
[9] Tianhao Li,et al. Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding , 2021, AAAI.
[10] Shiwei Zhang,et al. Support-Set Based Cross-Supervision for Video Grounding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Shaogang Gong,et al. Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Yu-Gang Jiang,et al. Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Rui Qiao,et al. Interventional Video Grounding with Dual Contrastive Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Zhengjun Zha,et al. Structured Multi-Level Interaction Network for Video Moment Localization via Language Query , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Heng Tao Shen,et al. Multi-stage Aggregated Transformer Network for Temporal Language Localization in Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Wei Ji,et al. Boundary Proposal Network for Two-Stage Natural Language Video Localization , 2021, AAAI.
[17] Mingsheng Long,et al. Self-Tuning for Data-Efficient Deep Learning , 2021, ICML.
[18] Yongdong Zhang,et al. Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding , 2021, IEEE Transactions on Image Processing.
[19] Bohyung Han,et al. Local-Global Video-Text Interactions for Temporal Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Zhou Yu,et al. Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos , 2020, ArXiv.
[21] Wenhan Luo,et al. Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video , 2020, ArXiv.
[22] Jiebo Luo,et al. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language , 2019, AAAI.
[23] Zhou Zhao,et al. Weakly-Supervised Video Moment Retrieval via Semantic Completion Network , 2019, AAAI.
[24] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[25] Kate Saenko,et al. LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[26] Wenhao Jiang,et al. Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction , 2019, AAAI.
[27] Larry S. Davis,et al. WSLLN:Weakly Supervised Natural Language Localization Networks , 2019, EMNLP.
[28] Jiebo Luo,et al. Localizing Natural Language in Videos , 2019, AAAI.
[29] Bin Jiang,et al. Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention , 2019, ICMR.
[30] Amit K. Roy-Chowdhury,et al. Weakly Supervised Video Moment Retrieval From Text Queries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Chuang Gan,et al. Weakly Supervised Dense Event Captioning in Videos , 2018, NeurIPS.
[32] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Ramakant Nevatia,et al. MAC: Mining Activity Concepts for Language-Based Temporal Localization , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[34] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[35] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.
[36] Zhetao Li,et al. Three-Dimensional Attention-Based Deep Ranking Model for Video Highlight Detection , 2018, IEEE Transactions on Multimedia.
[37] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[39] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[41] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[43] Bernt Schiele,et al. Script Data for Attribute-Based Recognition of Composite Activities , 2012, ECCV.
[44] Kun-Juan Wei,et al. Point-Supervised Video Temporal Grounding , 2023, IEEE Transactions on Multimedia.
[45] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.