Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding
暂无分享,去创建一个
Yongdong Zhang | Feng Wu | Wenfei Yang | Tianzhu Zhang | Feng Wu | Tianzhu Zhang | Yongdong Zhang | Wenfei Yang
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Bohyung Han,et al. Local-Global Video-Text Interactions for Temporal Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Jiebo Luo,et al. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language , 2019, AAAI.
[5] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).
[6] Jiebo Luo,et al. Utilizing semantic word similarity measures for video retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[7] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[10] Wenhan Luo,et al. Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video , 2020, ArXiv.
[11] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[12] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[13] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.
[14] Jungong Han,et al. Saliency-Guided Attention Network for Image-Sentence Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Zhou Zhao,et al. Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos , 2019, SIGIR.
[16] Qingming Huang,et al. Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Liqiang Nie,et al. Neural Multimodal Cooperative Learning Toward Micro-Video Understanding , 2020, IEEE Transactions on Image Processing.
[20] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.
[21] Hongdong Li,et al. Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[22] Shizhe Chen,et al. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yu-Gang Jiang,et al. Semantic Proposal for Activity Localization in Videos via Sentence Query , 2019, AAAI.
[24] Zhou Yu,et al. Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos , 2020, ArXiv.
[25] Guanbin Li,et al. Linguistic Structure Guided Context Modeling for Referring Image Segmentation , 2020, ECCV.
[26] Liang Wang,et al. Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Xiao Liu,et al. Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos , 2019, AAAI.
[28] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[29] Chen Qian,et al. A Real-Time Cross-Modality Correlation Filtering Method for Referring Expression Comprehension , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Zhijie Lin,et al. Weakly-Supervised Video Moment Retrieval via Semantic Completion Network , 2020, AAAI.
[31] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Tao Mei,et al. Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[33] Ji Liu,et al. IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Larry S. Davis,et al. WSLLN:Weakly Supervised Natural Language Localization Networks , 2019, EMNLP.
[37] Yitian Yuan,et al. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Amit K. Roy-Chowdhury,et al. Weakly Supervised Video Moment Retrieval From Text Queries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Max Welling,et al. Attention-based Deep Multiple Instance Learning , 2018, ICML.
[40] Jungong Han,et al. SMAN: Stacked Multimodal Attention Network for Cross-Modal Image–Text Retrieval , 2020, IEEE Transactions on Cybernetics.
[41] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[42] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[43] Yongdong Zhang,et al. Multi-Modality Cross Attention Network for Image and Sentence Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[45] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[46] Shih-Fu Chang,et al. Grounding Referring Expressions in Images by Variational Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Yunchao Wei,et al. Referring Image Segmentation via Cross-Modal Progressive Comprehension , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.