Dialogue-to-Video Retrieval
暂无分享,去创建一个
[1] Junier B. Oliva,et al. Learning to Retrieve Videos by Asking Questions , 2022, ACM Multimedia.
[2] Fan Yang,et al. Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss , 2021, ArXiv.
[3] Yinhe Zheng,et al. MMChat: Multi-Modal Chat Dataset on Social Media , 2021, LREC.
[4] Yajuan Lü,et al. Improving Video Retrieval by Adaptive Margin , 2021, SIGIR.
[5] Qin Jin,et al. Towards Diverse Paragraph Captioning for Untrimmed Videos , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Nan Duan,et al. CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval , 2021, Neurocomputing.
[7] Danqi Chen,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.
[8] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Aleksandr Petiushko,et al. MDMMT: Multidomain Multimodal Transformer for Video Retrieval , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[10] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[11] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[12] Chen Sun,et al. Multi-modal Transformer for Video Retrieval , 2020, ECCV.
[13] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[14] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[15] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[16] Yang Liu,et al. Use What You Have: Video retrieval using representations from collaborative experts , 2019, BMVC.
[17] Tatsuya Harada,et al. Interactive Video Retrieval with Dialog , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[18] Anoop Cherian,et al. Audio Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Amit K. Roy-Chowdhury,et al. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval , 2018, ICMR.
[20] Ivan Laptev,et al. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data , 2018, ArXiv.
[21] Changsheng Xu,et al. Text2Video: An End-to-end Learning Framework for Expressing Text With Videos , 2018, IEEE Transactions on Multimedia.
[22] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[23] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[29] S. Hochreiter,et al. Long Short-Term Memory , 1997, Neural Computation.
[30] M. Tran,et al. AVSeeker: An Active Video Retrieval Engine at VBS2022 , 2022, MMM.
[31] K. U. Barthel,et al. Efficient Search and Browsing of Large-Scale Video Collections with Vibro , 2022, MMM.
[32] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[33] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[34] A. Krizhevsky. ImageNet Classification with Deep Convolutional Neural Networks , 2022 .