NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text
暂无分享,去创建一个
Muhammet Bastan | Alex ChiChung Kot | Chen Zhuo | Zhao Heng | Jiuxiang Gu | Dennis Sng | Xiangxi Shi
[1] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[2] Shin'ichi Satoh,et al. Consensus-based Sequence Training for Video Captioning , 2017, ArXiv.
[3] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.
[7] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[8] George Awad,et al. On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017 , 2018, IEEE Transactions on Multimedia.
[9] Jonathan G. Fiscus,et al. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.
[10] Walter Daelemans,et al. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.
[11] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[13] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[14] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[15] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[16] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[17] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).