Harnessing Object and Scene Semantics for Large-Scale Video Understanding
暂无分享,去创建一个
Yu-Gang Jiang | Yanwei Fu | Leonid Sigal | Zuxuan Wu | L. Sigal | Yu-Gang Jiang | Zuxuan Wu | Yanwei Fu
[1] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.
[3] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[4] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[5] Li Fei-Fei,et al. Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.
[6] Mubarak Shah,et al. High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.
[7] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[9] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Shih-Fu Chang,et al. Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Shuang Wu,et al. Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[12] Shih-Fu Chang,et al. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Cordelia Schmid,et al. Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Samy Bengio,et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.
[15] I. Biederman. Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.
[16] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[17] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Cees Snoek,et al. What do 15,000 object categories tell us about classifying and localizing actions? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.
[20] Leonid Sigal,et al. A Unified Semantic Embedding: Relating Taxonomies and Attributes , 2014, NIPS.
[21] Greg Mori,et al. Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification , 2012, ECCV.
[22] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[23] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[25] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[26] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[27] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[28] Xi Wang,et al. Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.
[29] P. Bartlett,et al. Probabilities for SV Machines , 2000 .
[30] Yanwei Fu,et al. Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Bolei Zhou,et al. Object Detectors Emerge in Deep Scene CNNs , 2014, ICLR.
[33] Cordelia Schmid,et al. Actions in context , 2009, CVPR.
[34] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[35] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[36] Hui Cheng,et al. Video event recognition using concept attributes , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).
[37] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[38] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[39] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] Tao Xiang,et al. Learning Multimodal Latent Attributes , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[41] Chenliang Xu,et al. Can humans fly? Action understanding with multiple classes of actors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Nazli Ikizler-Cinbis,et al. Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.