暂无分享,去创建一个
[1] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.
[2] Hao Zhang,et al. Span-based Localizing Network for Natural Language Video Localization , 2020, ACL.
[3] Fuzheng Zhang,et al. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer , 2021, ACL.
[4] Mohit Bansal,et al. TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval , 2020, ECCV.
[5] Liangli Zhen,et al. Video Corpus Moment Retrieval with Contrastive Learning , 2021, SIGIR.
[6] Ming Zhao,et al. A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus , 2020, ArXiv.
[7] Zhe Gan,et al. HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training , 2020, EMNLP.
[8] Liangli Zhen,et al. Parallel Attention Network with Sequence Matching for Video Grounding , 2021, FINDINGS.
[9] Zhe Gan,et al. VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation , 2021, NeurIPS Datasets and Benchmarks.
[10] Bernard Ghanem,et al. Temporal Localization of Moments in Video Collections with Natural Language , 2019, ArXiv.