Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework
暂无分享,去创建一个
Wei Chen | Ran Xu | Jason J. Corso | Caiming Xiong | Caiming Xiong | Wei Chen | Ran Xu
[1] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[2] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.
[3] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).
[4] Alan F. Murray,et al. IEEE International Conference on Neural Networks , 1997 .
[5] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.
[6] Ted Pedersen,et al. WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.
[7] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[8] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[9] Yasuo Kuniyoshi,et al. Evaluation of dimensionality reduction methods for image auto-annotation , 2010, BMVC.
[10] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.
[11] Fei-Fei Li,et al. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[12] B. S. Manjunath,et al. Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.
[13] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[14] Zi Huang,et al. Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.
[15] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.
[16] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[17] Raymond J. Mooney,et al. Improving Video Activity Recognition using Object Recognition and Text Mining , 2012, ECAI.
[18] Sven J. Dickinson,et al. Video In Sentences Out , 2012, UAI.
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Fei-Fei Li,et al. Video Event Understanding Using Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[22] Kate Saenko,et al. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge , 2013, AAAI.
[23] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[24] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[25] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[26] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Chong-Wah Ngo,et al. Annotation for free: video tagging by mining user search behavior , 2013, ACM Multimedia.
[28] Pradipto Das,et al. Translating related words to videos and back through latent topics , 2013, WSDM.
[29] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[30] Bernt Schiele,et al. Translating Video Content to Natural Language Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.
[31] Mubarak Shah,et al. Spatiotemporal Deformable Part Models for Action Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Weiyu Zhang,et al. From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.
[33] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[34] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[35] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[36] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.