Share-and-Chat: Achieving Human-Level Video Commenting by Search and Multi-View Embedding
暂无分享,去创建一个
Tao Mei | Hongyang Chao | Ting Yao | Yong Rui | Yehao Li | Y. Rui | Tao Mei | Ting Yao | Yehao Li | Hongyang Chao
[1] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[2] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[3] Tao Mei,et al. Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure , 2016, IJCAI.
[4] Tao Mei,et al. Correlative multi-label video annotation , 2007, ACM Multimedia.
[5] Tao Chen,et al. DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.
[6] Shipeng Li,et al. Query-driven iterated neighborhood graph search for large scale indexing , 2012, ACM Multimedia.
[7] Lorenzo Torresani,et al. C3D: Generic Features for Video Analysis , 2014, ArXiv.
[8] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[9] Tao Mei,et al. Deep Semantic-Preserving and Ranking-Based Hashing for Image Retrieval , 2016, IJCAI.
[10] Joseph P. Romano. On the behaviour of randomization tests without the group invariance assumption , 1990 .
[11] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .
[12] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[13] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[14] Chong-Wah Ngo,et al. Annotation for free: video tagging by mining user search behavior , 2013, ACM Multimedia.
[15] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[16] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[18] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[19] Rongrong Ji,et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.
[20] LazebnikSvetlana,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2014 .
[21] Meng Wang,et al. Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.
[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[23] Chong-Wah Ngo,et al. Learning Query and Image Similarities with Ranking Canonical Correlation Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] René Hen,et al. Dorsal vs Ventral Hippocampal Neurogenesis: Implications for Cognition and Mood , 2011, Neuropsychopharmacology.
[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[26] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[28] Eric Gilbert,et al. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.
[29] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[30] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[31] Tao Chen,et al. Predicting Viewer Affective Comments Based on Image Content in Social Media , 2014, ICMR.
[32] Dong Liu,et al. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.
[33] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[35] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Mark Sanderson,et al. Automatic video tagging using content redundancy , 2009, SIGIR.