Event-aware Video Corpus Moment Retrieval
暂无分享,去创建一个
[1] Jinpeng Wang,et al. GMMFormer: Gaussian-Mixture-Model based Transformer for Efficient Partially Relevant Video Retrieval , 2023, AAAI.
[2] Fumin Shen,et al. Progressive Event Alignment Network for Partial Relevant Video Retrieval , 2023, 2023 IEEE International Conference on Multimedia and Expo (ICME).
[3] Jae-Pil Heo,et al. Query - Dependent Video Representation for Moment Retrieval and Highlight Detection , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Dahyun Kim,et al. Selective Query-Guided Debiasing for Video Corpus Moment Retrieval , 2022, ECCV.
[5] Tan Yu,et al. Cross-Probe BERT for Fast Cross-Modal Search , 2022, SIGIR.
[6] Seon Joo Kim,et al. UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Tsu-Jui Fu,et al. VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling , 2021, ArXiv.
[8] Guangyi Xiao,et al. Fine-grained Cross-modal Alignment Network for Text-Video Retrieval , 2021, ACM Multimedia.
[9] Dmytro Okhonko,et al. VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding , 2021, EMNLP.
[10] Jun Xiao,et al. Natural Language Video Localization with Learnable Moment Proposals , 2021, EMNLP.
[11] Chong-Wah Ngo,et al. CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval , 2021, ACM Multimedia.
[12] Mike Zheng Shou,et al. On Pursuit of Designing Multi-modal Transformer for Video Grounding , 2021, EMNLP.
[13] Heng Tao Shen,et al. Multi-stage Aggregated Transformer Network for Temporal Language Localization in Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Florian Metze,et al. VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding , 2021, FINDINGS.
[15] Dan Guo,et al. Proposal-Free Video Grounding with Contextual Pyramid Network , 2021, AAAI.
[16] Liangli Zhen,et al. Video Corpus Moment Retrieval with Contrastive Learning , 2021, SIGIR.
[17] Linchao Zhu,et al. T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Ivan Laptev,et al. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Jianfeng Dong,et al. Context-aware Biaffine Localizing Network for Temporal Sentence Grounding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Zhe Gan,et al. Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Weiyao Wang,et al. Generic Event Boundary Detection: A Benchmark for Event Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Ming Zhao,et al. A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus , 2020, ArXiv.
[24] Thomas Brox,et al. COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning , 2020, NeurIPS.
[25] Chen Sun,et al. Multi-modal Transformer for Video Retrieval , 2020, ECCV.
[26] James R. Glass,et al. AVLnet: Learning Audio-Visual Language Representations from Instructional Videos , 2020, Interspeech.
[27] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[28] Licheng Yu,et al. Hero: Hierarchical Encoder for Video+Language Omni-representation Pre-training , 2020, EMNLP.
[29] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[30] Long Chen,et al. Rethinking the Bottom-Up Framework for Query-Based Video Localization , 2020, AAAI.
[31] Hao Zhang,et al. Span-based Localizing Network for Natural Language Video Localization , 2020, ACL.
[32] Shizhe Chen,et al. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[34] Mohit Bansal,et al. TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval , 2020, ECCV.
[35] Andrew Zisserman,et al. End-to-End Learning of Visual Representations From Uncurated Instructional Videos , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yang Liu,et al. Use What You Have: Video retrieval using representations from collaborative experts , 2019, BMVC.
[37] Yu-Gang Jiang,et al. Semantic Proposal for Activity Localization in Videos via Sentence Query , 2019, AAAI.
[38] Jiebo Luo,et al. Localizing Natural Language in Videos , 2019, AAAI.
[39] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[41] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Qi Tian,et al. Cross-modal Moment Localization in Videos , 2018, ACM Multimedia.
[44] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[45] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[46] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.
[47] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[48] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.
[49] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[51] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[52] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[54] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[56] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[57] Hugo Larochelle,et al. A Neural Autoregressive Topic Model , 2012, NIPS.
[58] Jeffrey M. Zacks,et al. Event perception , 2011, Scholarpedia.
[59] David L. Chen,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[60] Bingshu Wang,et al. Cross-Modality Knowledge Calibration Network for Video Corpus Moment Retrieval , 2024, IEEE Transactions on Multimedia.
[61] Yilong Yin,et al. Video Corpus Moment Retrieval via Deformable Multigranularity Feature Fusion and Adversarial Training , 2023, IEEE Transactions on Circuits and Systems for Video Technology.
[62] Ping Li,et al. Inflate and Shrink:Enriching and Reducing Interactions for Fast Text-Image Retrieval , 2021, EMNLP.
[63] Jie Lei,et al. Detecting Moments and Highlights in Videos via Natural Language Queries , 2021, NeurIPS.
[64] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.