暂无分享,去创建一个
Zijian Gao | Hao Zhang | Sheng Chen | Jingyu Liu | Dedan Chang | Jinwei Yuan | Hao Zhang | Sheng Chen | Zijian Gao | Dedan Chang | Jinwei Yuan | Jingyun Liu
[1] Aleksandr Petiushko,et al. MDMMT: Multidomain Multimodal Transformer for Video Retrieval , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[2] Yonatan Bisk,et al. TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Hugo Terashima-Mar'in,et al. A Straightforward Framework For Video Retrieval Using CLIP , 2021, MCPR.
[4] Gunhee Kim,et al. A Joint Sequence Fusion Model for Video Question Answering and Retrieval , 2018, ECCV.
[5] Nan Duan,et al. CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval , 2021, Neurocomputing.
[6] Yang Liu,et al. Use What You Have: Video retrieval using representations from collaborative experts , 2019, BMVC.
[7] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Florian Metze,et al. Support-set bottlenecks for video-text representation learning , 2020, ICLR.
[10] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, ArXiv.
[11] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[12] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[13] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Pengfei Xiong,et al. CLIP2Video: Mastering Video-Text Retrieval via Image CLIP , 2021, ArXiv.
[15] Shengsheng Qian,et al. HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Fan Yang,et al. Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss , 2021, ArXiv.
[17] Chen Sun,et al. Multi-modal Transformer for Video Retrieval , 2020, ECCV.
[18] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.