Dual Encoding for Zero-Example Video Retrieval
暂无分享,去创建一个
Xirong Li | Jianfeng Dong | Chaoxi Xu | Yuan He | Shouling Ji | Xun Wang | Gang Yang | Xirong Li | S. Ji | Chaoxi Xu | Jianfeng Dong | Yuan He | Gang Yang | Xun Wang
[1] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Georges Quénot,et al. TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.
[3] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[4] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[5] Ioannis Patras,et al. Query and Keyframe Representations for Ad-hoc Video Search , 2017, ICMR.
[6] Amit K. Roy-Chowdhury,et al. Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval , 2018, ICMR.
[7] Xirong Li,et al. Renmin University of China and Zhejiang Gongshang University at TRECVID 2018: Deep Cross-Modal Embeddings for Video-Text Retrieval , 2018, TRECVID.
[8] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[9] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[10] Yiannis Kompatsiaris,et al. ITI-CERTH participation to TRECVID 2015 , 2015, TRECVID.
[11] Tetsuji Ogawa,et al. Waseda_Meisei at TRECVID 2018: Ad-hoc Video Search , 2018, TRECVID.
[12] Jonathan G. Fiscus,et al. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.
[13] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[14] Yiannis Kompatsiaris,et al. ITI-CERTH participation in TRECVID 2018 , 2017, TRECVID.
[15] Shuang Wu,et al. Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Xirong Li,et al. University of Amsterdam and Renmin University at TRECVID 2016: Searching Video, Detecting Events and Describing Video , 2016, TRECVID.
[17] Deyu Meng,et al. Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.
[18] Duy-Dinh Le,et al. NII-HITACHI-UIT at TRECVID 2017 , 2016, TRECVID.
[19] Gunhee Kim,et al. A Joint Sequence Fusion Model for Video Question Answering and Retrieval , 2018, ECCV.
[20] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[22] Cees Snoek,et al. Video2vec Embeddings Recognize Events When Examples Are Scarce , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] Cees Snoek,et al. Composite Concept Discovery for Zero-Shot Video Event Detection , 2014, ICMR.
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] David J. Fleet,et al. VSE++: Improved Visual-Semantic Embeddings , 2017, ArXiv.
[26] James Allan,et al. Zero-shot video retrieval using content and concepts , 2013, CIKM.
[27] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[30] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[31] Qing Li,et al. VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking , 2017, TRECVID.
[32] Jiande Sun,et al. Informedia @ TRECVID 2016 , 2016, TRECVID.
[33] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Chong-Wah Ngo,et al. Event Detection with Zero Example: Select the Right and Suppress the Wrong Concepts , 2016, ICMR.
[36] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Yi Yang,et al. Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection , 2015, IJCAI.
[38] Xirong Li,et al. Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.
[39] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.