Progressive Semantic Matching for Video-Text Retrieval
暂无分享,去创建一个
Fanhua Shang | Yuanyuan Liu | Hongying Liu | Ruyi Luo | Mantang Niu | Yuanyuan Liu | Fanhua Shang | Hongying Liu | Ruyi Luo | Mantang Niu
[1] Chong-Wah Ngo,et al. Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.
[2] Leonid Sigal,et al. Learning Language-Visual Embedding for Movie Understanding with Natural-Language , 2016, ArXiv.
[3] Gunhee Kim,et al. A Joint Sequence Fusion Model for Video Question Answering and Retrieval , 2018, ECCV.
[4] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Xianglong Liu,et al. Graph Convolutional Network Hashing for Cross-Modal Retrieval , 2019, IJCAI.
[7] Yang Liu,et al. Use What You Have: Video retrieval using representations from collaborative experts , 2019, BMVC.
[8] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.
[9] Shizhe Chen,et al. Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Changsheng Xu,et al. Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval , 2020, IEEE Transactions on Multimedia.
[15] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.
[16] Tat-Seng Chua,et al. Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval , 2020, SIGIR.
[17] Wei Wang,et al. Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval , 2020, IEEE Transactions on Multimedia.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Ivan Laptev,et al. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data , 2018, ArXiv.
[20] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Yale Song,et al. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Cees G. M. Snoek,et al. Shuffled ImageNet Banks for Video Event Detection and Search , 2020, ACM Trans. Multim. Comput. Commun. Appl..
[25] Dima Damen,et al. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[27] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.
[28] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[29] Amit K. Roy-Chowdhury,et al. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval , 2018, ICMR.
[30] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[31] Xirong Li,et al. W2VV++: Fully Deep Learning for Ad-hoc Video Search , 2019, ACM Multimedia.
[32] Xirong Li,et al. Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.
[33] Fei Wang,et al. Million-scale near-duplicate video retrieval system , 2011, ACM Multimedia.
[34] Xirong Li,et al. Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).