Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval
暂无分享,去创建一个
Tat-Seng Chua | Jianfeng Dong | Meng Wang | Xun Wang | Yixin Cao | Xun Yang | Meng Wang | Tat-Seng Chua | Xun Yang | Jianfeng Dong | Yixin Cao | Xun Wang
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[3] Victor O. K. Li,et al. Neural Machine Translation with Gumbel-Greedy Decoding , 2017, AAAI.
[4] Marcel Worring,et al. Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..
[5] Jihun Choi,et al. Learning to Compose Task-Specific Tree Structures , 2017, AAAI.
[6] Xirong Li,et al. Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Meng Wang,et al. Learning Visual Semantic Relationships for Efficient Visual Retrieval , 2015, IEEE Transactions on Big Data.
[9] Xu Chen,et al. Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision , 2018, EMNLP.
[10] Xirong Li,et al. W2VV++: Fully Deep Learning for Ad-hoc Video Search , 2019, ACM Multimedia.
[11] Wang Ling,et al. Memory Architectures in Recurrent Neural Network Language Models , 2018, ICLR.
[12] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[13] Lior Wolf,et al. Associating neural word embeddings with deep image representations using Fisher Vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Leonid Sigal,et al. Learning Language-Visual Embedding for Movie Understanding with Natural-Language , 2016, ArXiv.
[15] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Meng Wang,et al. Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud , 2017, IEEE Transactions on Image Processing.
[17] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Ioannis Patras,et al. Query and Keyframe Representations for Ad-hoc Video Search , 2017, ICMR.
[19] Gunhee Kim,et al. A Joint Sequence Fusion Model for Video Question Answering and Retrieval , 2018, ECCV.
[20] Amit K. Roy-Chowdhury,et al. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval , 2018, ICMR.
[21] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[22] Meng Wang,et al. Learning concept bundles for video search with complex queries , 2011, MM '11.
[23] Yu Qiao,et al. Find and Focus: Retrieve and Localize Video Events with Natural Language Queries , 2018, ECCV.
[24] Ivan Laptev,et al. Learning a Text-Video Embedding from Incomplete and Heterogeneous Data , 2018, ArXiv.
[25] Ivan Laptev,et al. Learning from Video and Text via Large-Scale Discriminative Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Xu Chen,et al. Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding , 2017, ACL.
[27] Liang Wang,et al. Improving Description-Based Person Re-Identification by Multi-Granularity Image-Text Alignments , 2019, IEEE Transactions on Image Processing.
[28] Yiannis Kompatsiaris,et al. ITI-CERTH participation in TRECVID 2018 , 2017, TRECVID.
[29] Kevin Gimpel,et al. Visually Grounded Neural Syntax Acquisition , 2019, ACL.
[30] Dima Damen,et al. Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Xirong Li,et al. Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.
[32] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.
[33] Xiaogang Wang,et al. Person Search with Natural Language Description , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Tal Hassner,et al. Temporal Tessellation: A Unified Approach for Video Analysis , 2016, ICCV.
[36] Qing Li,et al. VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking , 2017, TRECVID.
[37] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Tetsuji Ogawa,et al. Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search , 2018, TRECVID.
[39] Yiannis Kompatsiaris,et al. ITI-CERTH participation to TRECVID 2015 , 2015, TRECVID.
[40] Jascha Sohl-Dickstein,et al. Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.
[41] Meng Wang,et al. Person Re-Identification With Metric Learning Using Privileged Information , 2018, IEEE Transactions on Image Processing.
[42] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Sanja Fidler,et al. Visual Semantic Search: Retrieving Videos via Complex Textual Queries , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[45] Meng Wang,et al. Utilizing Related Samples to Enhance Interactive Concept-Based Video Search , 2011, IEEE Transactions on Multimedia.
[46] Qi Tian,et al. Enhancing Person Re-identification in a Self-Trained Subspace , 2017, ACM Trans. Multim. Comput. Commun. Appl..
[47] Xirong Li,et al. University of Amsterdam and Renmin University at TRECVID 2016: Searching Video, Detecting Events and Describing Video , 2016, TRECVID.
[48] Duy-Dinh Le,et al. NII-HITACHI-UIT at TRECVID 2017 , 2016, TRECVID.
[49] Rogério Schmidt Feris,et al. Dialog-based Interactive Image Retrieval , 2018, NeurIPS.
[50] Jongwook Choi,et al. Video Captioning and Retrieval Models with Semantic Attention , 2016, ArXiv.
[51] Qi Tian,et al. Video-Based Cross-Modal Recipe Retrieval , 2019, ACM Multimedia.
[52] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.
[53] Qi Tian,et al. Cross-modal Moment Localization in Videos , 2018, ACM Multimedia.
[54] Xirong Li. Deep Learning for Video Retrieval by Natural Language , 2019 .