An Analysis of Action Recognition Datasets for Language and Vision Tasks
暂无分享,去创建一个
[1] Svetlana Lazebnik,et al. Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering , 2016, ECCV.
[2] Jiaxuan Wang,et al. HICO: A Benchmark for Recognizing Human-Object Interactions in Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures , 2016, J. Artif. Intell. Res..
[4] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.
[5] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[6] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[7] Raffaella Bernardi,et al. TUHOI: Trento Universal Human Object Interaction Dataset , 2014, VL@COLING.
[8] Ivan Laptev,et al. Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.
[9] Frank Keller,et al. Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings , 2016, NAACL.
[10] Beth Levin,et al. English Verb Classes and Alternations: A Preliminary Investigation , 1993 .
[11] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.
[12] Raffaella Bernardi,et al. Exploiting language models to recognize unseen actions , 2013, ICMR '13.
[13] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.
[14] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[15] Pinar Duygulu Sahin,et al. Recognizing actions from still images , 2008, 2008 19th International Conference on Pattern Recognition.
[16] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.
[17] David A. Forsyth,et al. Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..
[18] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Pietro Perona,et al. Describing Common Human Visual Actions in Images , 2015, BMVC.
[20] Yann LeCun,et al. Convolutional Learning of Spatio-temporal Features , 2010, ECCV.
[21] Fei-Fei Li,et al. What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[22] Francis Ferraro,et al. A Survey of Current Datasets for Vision and Language Research , 2015, EMNLP.
[23] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[24] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[25] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[26] Jitendra Malik,et al. Visual Semantic Role Labeling , 2015, ArXiv.
[27] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Licheng Yu,et al. Visual Madlibs: Fill in the Blank Description Generation and Question Answering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[30] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.
[31] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Fei-Fei Li,et al. Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[33] Samy Bengio,et al. Learning semantic relationships for better action retrieval in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.
[35] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[36] Simone Paolo Ponzetto,et al. BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.
[37] Martha Palmer,et al. Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .
[38] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[39] Changsong Liu,et al. Grounded Semantic Role Labeling , 2016, NAACL.
[40] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[41] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[42] Marie-Francine Moens,et al. Multi-Modal Representations for Improved Bilingual Lexicon Learning , 2016, ACL.
[43] Hans-Hellmut Nagel,et al. A vision of ‘vision and language’ comprises action: An example from road traffic , 2004, Artificial Intelligence Review.
[44] Nazli Ikizler-Cinbis,et al. Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.
[45] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[46] James F. O'Brien,et al. Computational Studies of Human Motion , 2006 .